MultiIndexing rows versus columns in pandas DataFrame

I work with a multi-indexing data framework in pandas and wonder if I should specify rows or columns.

My data looks something like this: DataTable

the code:

import numpy as np
import pandas as pd
arrays = pd.tools.util.cartesian_product([['condition1', 'condition2'], 
                                          ['patient1', 'patient2'],
                                          ['measure1', 'measure2', 'measure3']])
colidxs = pd.MultiIndex.from_arrays(arrays, 
                                    names=['condition', 'patient', 'measure'])
rowidxs = pd.Index([0,1,2,3], name='time')
data = pd.DataFrame(np.random.randn(len(rowidxs), len(colidxs)), 
                    index=rowidxs, columns=colidxs)

Here I select a multiindex column with the rationale that the pandas dataframe consists of a series, and my data ultimately represents a bunch of time series (hence, it is indexed by time).

, , , multiindexing. , - , query , , - df.T.query('color == "red"').T.

, , (, query ).

.

+4
1

, / DataFrame:

  • []: column-first
  • get:
  • , :
  • query:
  • loc, iloc, ix: -
  • xs: -
  • sortlevel: -
  • groupby: -

"-" , [:, ] axis=1; "row-only" , , - .

, , .

: pandas / DataFrame? , [] loc/iloc/ix , .

0

Source: https://habr.com/ru/post/1529128/


All Articles