How to have groupby.first not remove timezone information from datetime columns?

Observe this strange behavior:

In [1]: import pandas as pd

In [2]: import datetime

In [3]: import pytz

In [4]: dates = [datetime.datetime(2015,1,i,tzinfo=pytz.timezone('US/Pacific')) for i in range(1,5)]

In [5]: df = pd.DataFrame({'A': ['a','b']*2,'B': dates})

In [6]: df
Out[6]: 
   A                          B
0  a  2015-01-01 00:00:00-08:00
1  b  2015-01-02 00:00:00-08:00
2  a  2015-01-03 00:00:00-08:00
3  b  2015-01-04 00:00:00-08:00

In [7]: grouped = df.groupby('A') 

In [8]: grouped.nth(0) #B stays a datetime.datetime with timezone info
Out[8]: 
                           B
A                           
a  2015-01-01 00:00:00-08:00
b  2015-01-02 00:00:00-08:00

In [9]: grouped.head(1) #B stays a datetime.datetime with timezone 
Out[9]: 
                           B
0  2015-01-01 00:00:00-08:00
1  2015-01-02 00:00:00-08:00

In [10]: grouped.first() #B is naive pd.TimeStamp in UTC
Out[10]: 
                    B
A                    
a 2015-01-01 08:00:00
b 2015-01-02 08:00:00

I would like to know why this is happening, and if there is a way to prevent this. Are there any related issues that I should know about?

+4
source share

Source: https://habr.com/ru/post/1599565/


All Articles