Pandas groupby hour of the day in the dictionary

Question

Pandas groupby hour of the day in the dictionary

I have a pandas data frame as follows:

 date | Item | count ------------------------------------ 2016-12-06 10:45:08 | Item1 | 60 2016-12-06 10:45:08 | Item2 | 145 2016-12-06 09:45:00 | Item1 | 60 2016-12-06 09:44:54 | Item3 | 600 2016-12-06 09:44:48 | Item4 | 15 2016-12-06 11:45:08 | Item1 | 60 2016-12-06 10:45:08 | Item2 | 14 2016-11-06 09:45:00 | Item1 | 62 2016-11-06 09:44:54 | Item3 | 6 2016-11-06 09:44:48 | Item4 | 15

I am trying to group items to say hour of the day (or later than all day) in order to find out the following statistics: a list of items sold per day, for example:

On 2016-12-06 , from 09:00:00 to 10:00:00 , Item1, Item3 and Item4 were sold; etc.
In 2016-12-06 , Item1, Item2, Item3, Item4 (unique items) were sold.

While I am far from collecting these statistics, I linger with grouping by time. Initially print df.dtypes showed

 date object Item object count int64 dtype: object

So, I used the following line of code to convert a date column to a pandas date object.

 df['date'] = pd.to_datetime(df['date'])

and now print df.dtypes gives:

 date datetime64[ns] Item object count int64 dtype: object

However, when I try to group the date column using TimeGrouper by executing the following lines of code

 from pandas.tseries.resample import TimeGrouper print df.groupby([df['date'],pd.TimeGrouper(freq='Min')])

I get the following TypeError . As suggested here or here , conversion using pd.to_datetime was supposed to solve this problem.

 TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

I do not know how to solve this problem in order to go to the statistics that I am looking for. Any advice on resolving this error and using TimeGrouper to find statistics, preferably in a dictionary format (or anything that makes more sense) will be greatly appreciated.

+5

python dictionary numpy pandas dataframe

kingmakerking Feb 02 '17 at 9:04

source share

2 answers

jezrael · Answer 1 · 2017-02-02T09:10:19+0000

You can use groupby numpy array - datetimes with minutes and seconds removed:

 print (df['date'].values.astype('<M8[h]')) ['2016-12-06T10' '2016-12-06T10' '2016-12-06T09' '2016-12-06T09' '2016-12-06T09' '2016-12-06T11' '2016-12-06T10' '2016-11-06T09' '2016-11-06T09' '2016-11-06T09'] print (df.groupby(df['date'].values.astype('<M8[h]')).Item.unique()) 2016-11-06 09:00:00 [Item1, Item3, Item4] 2016-12-06 09:00:00 [Item1, Item3, Item4] 2016-12-06 10:00:00 [Item1, Item2] 2016-12-06 11:00:00 [Item1] Name: Item, dtype: object print (df.groupby(df['date'].values.astype('<M8[h]')).Item .apply(lambda x: x.unique().tolist()).to_dict()) {Timestamp('2016-11-06 09:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 09:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 10:00:00'): ['Item1', 'Item2'], Timestamp('2016-12-06 11:00:00'): ['Item1']}

 print (df.groupby(df['date'].values.astype('<M8[D]')).Item .apply(lambda x: x.unique().tolist()).to_dict()) {Timestamp('2016-11-06 00:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 00:00:00'): ['Item1', 'Item2', 'Item3', 'Item4']}

Thanks to Jeff for using round :

 print (df.groupby(df['date'].dt.round('h')).Item .apply(lambda x: x.unique().tolist()).to_dict()) {Timestamp('2016-11-06 10:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 12:00:00'): ['Item1'], Timestamp('2016-12-06 10:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 11:00:00'): ['Item1', 'Item2']} print (df.groupby(df['date'].dt.round('d')).Item .apply(lambda x: x.unique().tolist()).to_dict()) {Timestamp('2016-11-06 00:00:00'): ['Item1', 'Item3', 'Item4'], Timestamp('2016-12-06 00:00:00'): ['Item1', 'Item2', 'Item3', 'Item4']}

piRSquared · Answer 2 · 2017-02-02T09:14:21+0000

 sold = df.set_index('date').Item.resample('H').agg({'Sold': 'unique'}) sold[sold.Sold.str.len() > 0] Sold date 2016-11-06 09:00:00 [Item4, Item3, Item1] 2016-12-06 09:00:00 [Item4, Item3, Item1] 2016-12-06 10:00:00 [Item1, Item2] 2016-12-06 11:00:00 [Item1]

Pandas groupby hour of the day in the dictionary

More articles: