Pandas: how to get a unique number of values ​​in cells when cells contain lists?

For some mysterious reason, I have a dataframe that looks like

index             col_weird      col_normal
2012-01-01 14:30  ['A','B']      2
2012-01-01 14:32  ['A','C','D']  4
2012-01-01 14:36  ['C','D']      2
2012-01-01 14:39  ['E','B']      4
2012-01-01 14:40  ['G','H']      2

I would like to re-change my data file every 5 minutes, and

  • get a unique number of items in all lists in col_weird,

  • get average col_normal

Of course, the use resample().col_weird.nunique()will fail with the first task, because I want a unique number of elements: between 14:30and 14:35I expect this number to be 4, corresponding to A, B, CD.

During the same period, the average is col_normal, of course, 3.

Any idea how to get this?

Thank!

+1
source share
2 answers

, list Series:

df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30    A
2012-01-01 14:30    B
2012-01-01 14:32    A
2012-01-01 14:32    C
2012-01-01 14:32    D
2012-01-01 14:36    C
2012-01-01 14:36    D
2012-01-01 14:39    E
2012-01-01 14:39    B
2012-01-01 14:40    G
2012-01-01 14:40    H
dtype: object

resample:

df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00    7
Freq: H, dtype: int64
+2

pd.TimeGrouper('5Min'), .

df.groupby(pd.TimeGrouper('5Min')).col.apply(lambda x: x.apply(pd.Series).stack().unique().shape[0])

index
2012-01-01 14:30:00    4
2012-01-01 14:35:00    4
2012-01-01 14:40:00    2
Freq: 5T, Name: col, dtype: int64
+1

Source: https://habr.com/ru/post/1654624/


All Articles