You are completely right that a DataFrame can be thought of as a list of columns or even a more (ordered) dictionary of columns (see here ).
Indeed, each column must be of a uniform type, and different columns can be of different types. But with the help of object dtype you can still store different types of objects in one column (although it is not recommended separately, for example, for rows).
To illustrate, if you specify DataFrame data types, you will get a dtype for each column:
In [2]: df = pd.DataFrame({'int_col':[0,1,2], 'float_col':[0.0,1.1,2.5], 'bool_col':[True, False, True]}) In [3]: df.dtypes Out[3]: bool_col bool float_col float64 int_col int64 dtype: object
Internally, values โโare stored as blocks of the same type. Each column or collection of columns of the same type is stored in a separate array.
And that really means adding a line is more expensive. In the general case, adding several separate rows is not a good idea: it is better, for example, to pre-allocate an empty data frame to fill, or to put new rows / columns in a list and immediately combine them.
See the note at the end of concat / append docs (immediately before the first unit, โSet logic on other axesโ).
source share