Nature pandas DataFrame

As a continuation of my question about mixed types in a column :

Can a DataFrame be considered a list of columns or a list of rows?

In the first case, this means that (optimally) each column should be homogeneous (in type), and different columns can be of different types. In the latter case, it is assumed that each row is uniform in type.

For documentation:

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

This means that the DataFrame is a list of columns.

Does this mean that adding a row to a DataFrame more expensive than adding a column?

+5
source share
1 answer

You are completely right that a DataFrame can be thought of as a list of columns or even a more (ordered) dictionary of columns (see here ).

Indeed, each column must be of a uniform type, and different columns can be of different types. But with the help of object dtype you can still store different types of objects in one column (although it is not recommended separately, for example, for rows).
To illustrate, if you specify DataFrame data types, you will get a dtype for each column:

 In [2]: df = pd.DataFrame({'int_col':[0,1,2], 'float_col':[0.0,1.1,2.5], 'bool_col':[True, False, True]}) In [3]: df.dtypes Out[3]: bool_col bool float_col float64 int_col int64 dtype: object 

Internally, values โ€‹โ€‹are stored as blocks of the same type. Each column or collection of columns of the same type is stored in a separate array.

And that really means adding a line is more expensive. In the general case, adding several separate rows is not a good idea: it is better, for example, to pre-allocate an empty data frame to fill, or to put new rows / columns in a list and immediately combine them.
See the note at the end of concat / append docs (immediately before the first unit, โ€œSet logic on other axesโ€).

+5
source

Source: https://habr.com/ru/post/1208636/


All Articles