Observe this strange behavior:
In [1]: import pandas as pd
In [2]: import datetime
In [3]: import pytz
In [4]: dates = [datetime.datetime(2015,1,i,tzinfo=pytz.timezone('US/Pacific')) for i in range(1,5)]
In [5]: df = pd.DataFrame({'A': ['a','b']*2,'B': dates})
In [6]: df
Out[6]:
A B
0 a 2015-01-01 00:00:00-08:00
1 b 2015-01-02 00:00:00-08:00
2 a 2015-01-03 00:00:00-08:00
3 b 2015-01-04 00:00:00-08:00
In [7]: grouped = df.groupby('A')
In [8]: grouped.nth(0) #B stays a datetime.datetime with timezone info
Out[8]:
B
A
a 2015-01-01 00:00:00-08:00
b 2015-01-02 00:00:00-08:00
In [9]: grouped.head(1) #B stays a datetime.datetime with timezone
Out[9]:
B
0 2015-01-01 00:00:00-08:00
1 2015-01-02 00:00:00-08:00
In [10]: grouped.first() #B is naive pd.TimeStamp in UTC
Out[10]:
B
A
a 2015-01-01 08:00:00
b 2015-01-02 08:00:00
I would like to know why this is happening, and if there is a way to prevent this. Are there any related issues that I should know about?
source
share