Suppose you have the following DataFrame
:
rng = pd.date_range('1/1/2011', periods=72, freq='H') np.random.seed(10) n = 10 df = pd.DataFrame( { "datetime": np.random.choice(rng,n), "cat": np.random.choice(['a','b','b'], n), "val": np.random.randint(0,5, size=n) } )
If now groupby
:
gb = df.groupby(['cat','datetime']).sum()
I get the totals for each cat
for every hour:
cat datetime val a 2011-01-01 00:00:00 1 2011-01-01 09:00:00 3 2011-01-02 16:00:00 1 2011-01-03 16:00:00 1 b 2011-01-01 08:00:00 4 2011-01-01 15:00:00 3 2011-01-01 16:00:00 3 2011-01-02 04:00:00 4 2011-01-02 05:00:00 1 2011-01-02 12:00:00 4
However, I would like to have something like:
cat datetime val a 2011-01-01 4 2011-01-02 1 2011-01-03 1 b 2011-01-01 10 2011-01-02 9
I could get the desired result by adding another column called date
:
df['date'] = df.datetime.apply(pd.datetime.date)
and then do the same groupby
: df.groupby(['cat','date']).sum()
. But I'm interested in, is there still a pythonic way to do this? In addition, I could take a look at a month or a year. So what will be the right way?