The smartest way to create an expanded multi-index frame from a complex data frame

So, I have a very typical dataset that looks like this:

data = {'date': {0: '10/02/2017',
  1: '10/02/2017',
  2: '10/02/2017',
  3: '10/02/2017',
  4: '10/02/2017'},
 'field': {0: 'field1', 1: 'field2', 2: 'field1', 3: 'field2', 4: 'field3'},
 'type': {0: 'type1', 1: 'type1', 2: 'type2', 3: 'type2', 4: 'type2'},
 'value': {0: 1.79067,
  1: 1.7987200000000001,
  2: 1.7978900000000002,
  3: 1.8001099999999999,
  4: 1.8045599999999999}}

df = pd.DataFrame(data)
df.date = pd.to_datetime(df.date)

it should look like this (a real dataset has many different dates):

        date   field   type    value
0 2017-10-02  field1  type1  1.79067
1 2017-10-02  field2  type1  1.79872
2 2017-10-02  field1  type2  1.79789
3 2017-10-02  field2  type2  1.80011
4 2017-10-02  field3  type2  1.80456

I want to create a deployed multi-index frame, such as:

              type1             type2                  
field        field1   field2   field1   field2   field3
date                                                   
2017-10-02  1.79067  1.79872  1.79789  1.80011  1.80456

The smartest way I've found so far is this:

grouped = df.groupby('type')
res = {}
for name, df in grouped:
    res[name] = df.pivot(index='date', columns='field', values='value')
df = pd.concat(res, axis=1)

is there any other effective way to achieve this?

thank

+4
source share
4 answers

Option 1

Using unstack

In [36]: df.set_index(['date','field','type'])['value'].unstack([-1,-2])
Out[36]:
type          type1             type2
field        field1   field2   field1   field2   field3
date
2017-10-02  1.79067  1.79872  1.79789  1.80011  1.80456

Option2

pivot_table

pd.pivot_table(df,values='value',index='date',columns=['type','field'])
Out[464]: 
type          type1             type2                  
field        field1   field2   field1   field2   field3
date                                                   
2017-10-02  1.79067  1.79872  1.79789  1.80011  1.80456
+5
source

Option 1
Usingdefaultdict

from collections import defaultdict

d = defaultdict(lambda: defaultdict(dict))

for t in df.itertuples():
    d[('value', t.field, t.type)][t.date] = t.value

pd.DataFrame(d)

              value                                    
             field1            field2            field3
              type1    type2    type1    type2    type2
2017-10-02  1.79067  1.79789  1.79872  1.80011  1.80456

Option 2
Use groupby
It looks and feels like @Wen's answer

df.groupby(['date', 'field', 'type']).first().unstack([-2, -1])

              value                                    
field        field1            field2            field3
type          type1    type2    type1    type2    type2
date                                                   
2017-10-02  1.79067  1.79789  1.79872  1.80011  1.80456
+3
source
(df
 .set_index('date')
 .groupby([pd.TimeGrouper('D'), 'field', 'type'])
 .sum()
 .unstack(['field', 'type']))

output:

              value                                    
field        field1            field2            field3
type          type1    type2    type1    type2    type2
date                                                   
2017-10-02  1.79067  1.79789  1.79872  1.80011  1.80456
+2
source

You can use a combination of set_index and unstack ()

df1 = df.set_index(['date','field', 'type']).unstack().unstack().dropna(1).reset_index()
df1.columns = df1.columns.droplevel()

You get

type            type1           type2
field           field1  field2  field1  field2  field3
0   2017-10-02  1.79067 1.79872 1.79789 1.80011 1.80456
+1
source

Source: https://habr.com/ru/post/1687172/


All Articles