Python Pandas: Denormalize data from one data frame to another

I have a Pandas data frame that you can call "normalized." For display purposes, I want to “de-normalize” the data. That is, I want some data to be distributed across several key values ​​that I want to put on one line in the output records. Some records need to be summarized as they are combined. (Also: if someone has a better term for this than "denormalization", please make changes to this question or say so in the comments.)

I work with a Pandas data frame with many columns, so I will show you a simplified version below.

The following code sets the (almost) normalized source data frame. (Please note that I am looking for advice on the second block of code, and this block of code is only intended to provide some context.) Like my actual data, there are several duplicates in the identifying data, and some numbers should be summed up:

import pandas as pd

dates = pd.date_range('20170701', periods=21)
datesA1 = pd.date_range('20170701', periods=11)
datesB1 = pd.date_range('20170705', periods=9)
datesA2 = pd.date_range('20170708', periods=10)
datesB2 = pd.date_range('20170710', periods=11)
datesC1 = pd.date_range('20170701', periods=5)
datesC2 = pd.date_range('20170709', periods=9)

cols=['Date','Type','Count']

df_A1 = pd.DataFrame({'Date':datesA1,
                      'Type':'Apples',
                      'Count': np.random.randint(30,size=11)})
df_A2 = pd.DataFrame({'Date':datesA2,
                      'Type':'Apples',
                      'Count': np.random.randint(30,size=10)})
df_B1 = pd.DataFrame({'Date':datesB1,
                      'Type':'Berries',
                      'Count': np.random.randint(30,size=9)})
df_B2 = pd.DataFrame({'Date':datesB2,
                      'Type':'Berries',
                      'Count': np.random.randint(30,size=11)})
df_C1 = pd.DataFrame({'Date':datesC1,
                      'Type':'Canteloupes',
                      'Count': np.random.randint(30,size=5)})
df_C2 = pd.DataFrame({'Date':datesC2,
                      'Type':'Canteloupes',
                      'Count': np.random.randint(30,size=9)})

frames = [df_A1, df_A2, df_B1, df_B2, df_C1, df_C2]

dat_fra_source = pd.concat(frames)

In addition, the following code reaches my intentions. The source data frame has several rows per day and the type of fetus (A, B and C). Destination data has one row per day with the sum of A, B, and C.

dat_fra_dest = pd.DataFrame(0, index=dates, columns=['Apples','Berries','Canteloupes'])

for index,row in dat_fra_source.iterrows():
    dat_fra_dest.at[row['Date'],row['Type']]+=row['Count']

, : , / , , . , . , , "" , . , A, B C , .

+4
2

1

dat_fra_source.groupby(['Date','Type']).sum().unstack().fillna(0)

Out[63]: 
            Count                    
Type       Apples Berries Canteloupes
Date                                 
2017-07-01   13.0     0.0        24.0
2017-07-02   18.0     0.0        16.0
2017-07-03   11.0     0.0        29.0
2017-07-04   13.0     0.0         7.0
2017-07-05   24.0    11.0        23.0
2017-07-06    6.0     4.0         0.0
2017-07-07   29.0    26.0         0.0
2017-07-08   31.0    19.0         0.0
2017-07-09   38.0    17.0        26.0
2017-07-10   57.0    54.0         1.0
2017-07-11    4.0    41.0        10.0
2017-07-12   16.0    28.0        23.0
2017-07-13   25.0    20.0        20.0
2017-07-14   19.0     6.0        15.0
2017-07-15    6.0    22.0         7.0
2017-07-16   16.0     0.0         5.0
2017-07-17   29.0     7.0         4.0
2017-07-18    0.0    21.0         0.0
2017-07-19    0.0    19.0         0.0
2017-07-20    0.0     8.0         0.0

2

pd.pivot_table(dat_fra_source,index=['Date'],columns=['Type'],values='Count',aggfunc=sum).fillna(0)
Out[75]: 
Type        Apples  Berries  Canteloupes
Date                                    
2017-07-01    13.0      0.0         24.0
2017-07-02    18.0      0.0         16.0
2017-07-03    11.0      0.0         29.0
2017-07-04    13.0      0.0          7.0
2017-07-05    24.0     11.0         23.0
2017-07-06     6.0      4.0          0.0
2017-07-07    29.0     26.0          0.0
2017-07-08    31.0     19.0          0.0
2017-07-09    38.0     17.0         26.0
2017-07-10    57.0     54.0          1.0
2017-07-11     4.0     41.0         10.0
2017-07-12    16.0     28.0         23.0
2017-07-13    25.0     20.0         20.0
2017-07-14    19.0      6.0         15.0
2017-07-15     6.0     22.0          7.0
2017-07-16    16.0      0.0          5.0
2017-07-17    29.0      7.0          4.0
2017-07-18     0.0     21.0          0.0
2017-07-19     0.0     19.0          0.0
2017-07-20     0.0      8.0          0.0

vol weight

dat_fra_source['vol']=2
dat_fra_source['weight']=2
dat_fra_source.groupby(['Date','Type']).apply(lambda x: sum(x['vol']*x['weight']*x['Count'])).unstack().fillna(0)
Out[88]: 
Type        Apples  Berries  Canteloupes
Date                                    
2017-07-01    52.0      0.0         96.0
2017-07-02    72.0      0.0         64.0
2017-07-03    44.0      0.0        116.0
2017-07-04    52.0      0.0         28.0
2017-07-05    96.0     44.0         92.0
2017-07-06    24.0     16.0          0.0
2017-07-07   116.0    104.0          0.0
2017-07-08   124.0     76.0          0.0
2017-07-09   152.0     68.0        104.0
2017-07-10   228.0    216.0          4.0
2017-07-11    16.0    164.0         40.0
2017-07-12    64.0    112.0         92.0
2017-07-13   100.0     80.0         80.0
2017-07-14    76.0     24.0         60.0
2017-07-15    24.0     88.0         28.0
2017-07-16    64.0      0.0         20.0
2017-07-17   116.0     28.0         16.0
2017-07-18     0.0     84.0          0.0
2017-07-19     0.0     76.0          0.0
2017-07-20     0.0     32.0          0.0
+5

pd.crosstab:

pd.crosstab(dat_fra_source['Date'],
            dat_fra_source['Type'],
            dat_fra_source['Count'],
            aggfunc='sum',
            dropna=False).fillna(0)

:

Type        Apples  Berries  Canteloupes
Date                                    
2017-07-01    19.0      0.0          4.0
2017-07-02    25.0      0.0          4.0
2017-07-03    11.0      0.0         26.0
2017-07-04    27.0      0.0          8.0
2017-07-05     8.0     18.0         12.0
2017-07-06    10.0     11.0          0.0
2017-07-07     6.0     17.0          0.0
2017-07-08    10.0      5.0          0.0
2017-07-09    51.0     25.0         16.0
2017-07-10    31.0     23.0         21.0
2017-07-11    35.0     40.0         10.0
2017-07-12    16.0     30.0          9.0
2017-07-13    13.0     23.0         20.0
2017-07-14    21.0     26.0         27.0
2017-07-15    20.0     17.0         19.0
2017-07-16    12.0      4.0          2.0
2017-07-17    27.0      0.0          5.0
2017-07-18     0.0      5.0          0.0
2017-07-19     0.0     26.0          0.0
2017-07-20     0.0      6.0          0.0
+2

Source: https://habr.com/ru/post/1686704/


All Articles