Pandas re-fetch by groups with duplicate dates

Question

Pandas re-fetch by groups with duplicate dates

There are many similar questions here, but I could not find a single one that actually had observations with the same day and day. A minimal non-working example would be:

df = pd.DataFrame(
    {"Date": np.tile([pd.Series(["2016-01", "2016-03"])], 2)[0],
     "Group": [1,1,2,2],
     "Obs":[1,2,5,6]})

Now I would like to linearly interpolate the value for February 2016 by the group, so the required result

    Date    Group   Obs
    2016-01     1       1
    2016-02     1     1.5
    2016-03     1       2
    2016-01     2       5
    2016-02     2     5.5
    2016-03     2       6

My understanding is that I resampleshould be able to do this (in my actual application I am trying to move from quarter to month, so there are observations in January and April), but this requires some kind of time index, which I cannot do because Datethere are duplicates in the column .

I guess some kind of magic groupbycan help, but can't figure it out!

0

python pandas datetime

Nils Gudat 18 '16 9:57

2

:

#convert column Date to datetime
df['Date'] = pd.to_datetime(df.Date)
print (df)
        Date  Group  Obs
0 2016-01-01      1    1
1 2016-03-01      1    2
2 2016-01-01      2    5
3 2016-03-01      2    6

#groupby, resample and interpolate
df1 = df.groupby('Group').apply(lambda x : x.set_index('Date')
                                            .resample('M')
                                            .first()
                                            .interpolate())
                        .reset_index(level=0, drop=True).reset_index()

#convert Date to period
df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
     Date  Group  Obs
0 2016-01    1.0  1.0
1 2016-02    1.0  1.5
2 2016-03    1.0  2.0
3 2016-01    2.0  5.0
4 2016-02    2.0  5.5
5 2016-03    2.0  6.0

EDIT:

Pandas API (0.18.1), :

df['Date'] = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)

df1 = df.groupby('Group').apply(lambda df1: df1.resample('M')
                                               .first()
                                               .interpolate())
                         .reset_index(level=0, drop=True).reset_index()

df1['Date'] = df1.Date.dt.to_period('M')
print (df1)
     Date  Group  Obs
0 2016-01    1.0  1.0
1 2016-02    1.0  1.5
2 2016-03    1.0  2.0
3 2016-01    2.0  5.0
4 2016-02    2.0  5.5
5 2016-03    2.0  6.0

+2

jezrael 18 '16 10:25

IanS · Accepted Answer · 2016-05-18T10:25:49+0000

: resample reindex 2x.

df.set_index('Date', inplace=True)
index = ['2016-01', '2016-02', '2016-03']

df.groupby('Group').apply(lambda df1: df1.reindex(index).interpolate())

groupby , , ( df1) .

Pandas re-fetch by groups with duplicate dates

More articles: