How to generate n-level hierarchical JSON from pandas DataFrame?

Is there an efficient way to create hierarchical JSON (n-levels deep) where the parent values ​​are keys, not a variable label? i.e:

{"2017-12-31":
    {"Junior":
        {"Electronics":
            {"A":
                {"sales": 0.440755
                }
            },
            {"B":
                {"sales": -3.230951
                }
            }
        }, ...etc...
    }, ...etc...
}, ...etc... 

1. My testing DataFrame:

colIndex=pd.MultiIndex.from_product([['New York','Paris'],
                                     ['Electronics','Household'],
                                     ['A','B','C'],
                                     ['Junior','Senior']],
                               names=['City','Department','Team','Job Role'])

rowIndex=pd.date_range('25-12-2017',periods=12,freq='D')

df1=pd.DataFrame(np.random.randn(12, 24), index=rowIndex, columns=colIndex)
df1.index.name='Date'
df2=df1.resample('M').sum()
df3=df2.stack(level=0).groupby('Date').sum()

Source dataframe


2. The transformation that I create seems to be the most logical structure for creating JSON:

df4=df3.stack(level=[0,1,2]).reset_index() \
    .set_index(['Date','Job Role','Department','Team']) \
    .sort_index()

Transformed dataframe


3. My attempts are so far

I came across this very useful SO question that solves the problem for one level of nesting using code in lines:

j =(df.groupby(['ID','Location','Country','Latitude','Longitude'],as_index=False) \
    .apply(lambda x: x[['timestamp','tide']].to_dict('r'))\
    .reset_index()\
    .rename(columns={0:'Tide-Data'})\
    .to_json(orient='records'))

... but I can’t find a way to work with nested ones .groupby():

j=(df.groupby('date', as_index=True).apply(
    lambda x: x.groupby('Job Role', as_index=True).apply(
        lambda x: x.groupby('Department', as_index=True).apply(
            lambda x: x.groupby('Team', as_index=True).to_dict())))  \
                .reset_index().rename(columns={0:'sales'}).to_json(orient='records'))
+4
source share
1 answer

itertuples dict, json. string

df4=df3.stack(level=[0,1,2]).reset_index() 
df4['Date'] = df4['Date'].dt.strftime('%Y-%m-%d')
df4 = df4.set_index(['Date','Job Role','Department','Team']) \
    .sort_index()

dict

def nested_dict():
    return collections.defaultdict(nested_dict)
result = nested_dict()

itertuples

for row in df4.itertuples():
    result[row.Index[0]][row.Index[1]][row.Index[2]][row.Index[3]]['sales'] = row._1
    # print(row)

json, .

import json
json.dumps(result)

'{ "2017-12-31": { "Junior": { "" : { "A": { "sales" : -0.3947134370101142}, "B": { "sales" : -0.9873530754403204}, "C": { "sales" : -1.1182598058984508}}, " ": { "A": { "sales" : -1.1211850078098677}, "B": { "sales" : 2.0330914483907847}, "C": { "" : 3.94762379718749}}}, "Senior": { "" : { "A" : { "sales" : 1.4528493451404196}, "B" : { "sales" : -2.3277322345261005}, "C" : { "sales" : -2.8040263791743922}}, " ": { "A" : { "sales" : 3.0972591929279663}, "B" : { "sales" : 9.884565742502392}, "C" : { "sales" : 2.9359830722457576}}}}, "2018-01-31": { "Junior": { "" : { "A": { "sales" : -1.3580300149125217}, "B": { "sales" : 1.414665000013205}, "C": { "" : -1.432795129108244}}, " ": { "A": { "" : 2.7783259569115346}, "B": { "" : 2.717700275321333}, "C": { "" : 1.4358377416259644}} }, "": { "" : { "": { "" : 2.8981726774941485}, "": { "" : 12.022897003654117}, "": { "" : 0,01776855733076088}}, ": { "A" : { "" : -3.3421637766130 92}, "B": { "sales" : -5.283208386572307}, "C": { "sales" : 2.942580121975619}}}}} '

+1

Source: https://habr.com/ru/post/1685627/


All Articles