Nature pandas DataFrame

Question

Nature pandas DataFrame

As a continuation of my question about mixed types in a column :

Can a DataFrame be considered a list of columns or a list of rows?

In the first case, this means that (optimally) each column should be homogeneous (in type), and different columns can be of different types. In the latter case, it is assumed that each row is uniform in type.

For documentation:

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

This means that the DataFrame is a list of columns.

Does this mean that adding a row to a DataFrame more expensive than adding a column?

+5

python pandas

Drror Dec 9 '14 at 8:53

source share

1 answer

joris · Accepted Answer · 2014-12-09T09:09:32+0000

You are completely right that a DataFrame can be thought of as a list of columns or even a more (ordered) dictionary of columns (see here ).

Indeed, each column must be of a uniform type, and different columns can be of different types. But with the help of object dtype you can still store different types of objects in one column (although it is not recommended separately, for example, for rows).
To illustrate, if you specify DataFrame data types, you will get a dtype for each column:

 In [2]: df = pd.DataFrame({'int_col':[0,1,2], 'float_col':[0.0,1.1,2.5], 'bool_col':[True, False, True]}) In [3]: df.dtypes Out[3]: bool_col bool float_col float64 int_col int64 dtype: object

Internally, values are stored as blocks of the same type. Each column or collection of columns of the same type is stored in a separate array.

And that really means adding a line is more expensive. In the general case, adding several separate rows is not a good idea: it is better, for example, to pre-allocate an empty data frame to fill, or to put new rows / columns in a list and immediately combine them.
See the note at the end of concat / append docs (immediately before the first unit, “Set logic on other axes”).

Nature pandas DataFrame

More articles: