How to remake a mixed-type Pandas data frame?

I am creating a mixed type (floats and strings) of Pandas DataFrame df3 with the following Python code:

df1 = pd.DataFrame(np.random.randn(dates.shape[0],2),index=dates,columns=list('AB'))
df1['C'] = 'A'
df1['D'] = 'Pickles'
df2 = pd.DataFrame(np.random.randn(dates.shape[0], 2),index=dates,columns=list('AB'))
df2['C'] = 'B'
df2['D'] = 'Ham'
df3 = pd.concat([df1, df2], axis=0)

When I reformat df3 to a higher frequency, I don't get the frame reselected at a higher speed, but how to ignore it and I just get the missing values:

df4 = df3.groupby(['C']).resample('M',  how={'A': 'mean', 'B': 'mean',  'D': 'ffill'})
df4.head()

Result:

                      B          A        D
C                                          
A 2014-03-31 -0.4640906 -0.2435414  Pickles
  2014-04-30        NaN        NaN      NaN
  2014-05-31        NaN        NaN      NaN
  2014-06-30 -0.5626360  0.6679614  Pickles
  2014-07-31        NaN        NaN      NaN

When I reformat df3 to a lower frequency, I don't get any resampling at all:

df5 = df3.groupby(['C']).resample('A',  how={'A': np.mean, 'B': np.mean,  'D': 'ffill'})
df5.head()

Result:

                      B          A        D
C                                          
A 2014-03-31        NaN        NaN  Pickles
  2014-06-30        NaN        NaN  Pickles
  2014-09-30        NaN        NaN  Pickles
  2014-12-31 -0.7429617 -0.1065645  Pickles
  2015-03-31        NaN        NaN  Pickles

I am sure this has something to do with mixed types, because if I repeat the annual sampling using only numeric columns, everything works as expected:

df5b = df3[['A', 'B', 'C']].groupby(['C']).resample('A',  how={'A': np.mean, 'B': np.mean})
df5b.head()

Result:

                     B          A
  C                                 
  A 2014-12-31 -0.7429617 -0.1065645
    2015-12-31 -0.6245030 -0.3101057
  B 2014-12-31  0.4213621 -0.0708263
    2015-12-31 -0.0607028  0.0110456

But even when I switch to numeric types, re-sampling to a higher frequency still does not work, as I expected:

df4b = df3[['A', 'B', 'C']].groupby(['C']).resample('M',  how={'A': 'mean', 'B': 'mean'})
df4b.head()

Results:

                      B          A
C                                 
A 2014-03-31 -0.4640906 -0.2435414
  2014-04-30        NaN        NaN
  2014-05-31        NaN        NaN
  2014-06-30 -0.5626360  0.6679614
  2014-07-31        NaN        NaN

Which leaves me with two questions:

  • ?
  • , , ?

, .

+4
1

, , fill_method. , .

df4c = df3.groupby(['C']).resample('M',  fill_method='ffill')
df4c.head()
                     A          B        D
C                                          
A 2014-03-31 -0.2435414 -0.4640906  Pickles
  2014-04-30 -0.2435414 -0.4640906  Pickles
  2014-05-31 -0.2435414 -0.4640906  Pickles
  2014-06-30  0.6679614 -0.5626360  Pickles
  2014-07-31  0.6679614 -0.5626360  Pickles

, .

( , ) :

   df5c =df3.groupby(['C']).resample('A')
   df5c.head()
                  A          B
C                                 
A 2014-12-31 -0.1065645 -0.7429617
  2015-12-31 -0.3101057 -0.6245030
B 2014-12-31 -0.0708263  0.4213621
  2015-12-31  0.0110456 -0.0607028

, , ffill, .

+2

Source: https://habr.com/ru/post/1623323/


All Articles