How to read time slots as the timezone of a naive local DatetimeIndex with read_csv in pandas?

When I use pandas read_csv to read a timezone-specific column (and specify that column as an index), pandas converts it to timezone naive utc DatetimeIndex.

Data in Test.csv:

DateTime,Temperature 2016-07-01T11:05:07+02:00,21.125 2016-07-01T11:05:09+02:00,21.138 2016-07-01T11:05:10+02:00,21.156 2016-07-01T11:05:11+02:00,21.179 2016-07-01T11:05:12+02:00,21.198 2016-07-01T11:05:13+02:00,21.206 2016-07-01T11:05:14+02:00,21.225 2016-07-01T11:05:15+02:00,21.233

Code to read from csv:

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True)

This results in an index that represents the naive time of the time zone of the time zone:

In [3]: df.index

Out[3]: DatetimeIndex(['2016-07-01 09:05:07', '2016-07-01 09:05:09',
           '2016-07-01 09:05:10', '2016-07-01 09:05:11',
           '2016-07-01 09:05:12', '2016-07-01 09:05:13',
           '2016-07-01 09:05:14', '2016-07-01 09:05:15'],
          dtype='datetime64[ns]', name='DateTime', freq=None)

I tried using the date_parser function:

In [4]: date_parser = lambda x: pd.to_datetime(x).tz_localize(None)

In [5]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

This gave the same result.

How can I get read_csv to create a DatetimeIndex, which is the least time zone , and represents local time instead of utc time ?

I am using pandas 0.18.1.

+4
3

Alex DatetimeIndex, . DatetimeIndex, OP, dateutil.parser.parser, , ignoretz=True:

import dateutil

date_parser = lambda x: dateutil.parser.parse(x, ignoretz=True)
df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

print(df)

                     Temperature
DateTime                        
2016-07-01 11:05:07       21.125
2016-07-01 11:05:09       21.138
2016-07-01 11:05:10       21.156
2016-07-01 11:05:11       21.179
2016-07-01 11:05:12       21.198
2016-07-01 11:05:13       21.206
2016-07-01 11:05:14       21.225
2016-07-01 11:05:15       21.233
+3

docs, date_parser dateutil.parser.parser. , . , dateutil.parser.parser date_parser kwarg, .

import dateutil

df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=dateutil.parser.parse)

print(df)

                           Temperature
DateTime                              
2016-07-01 11:05:07+02:00       21.125
2016-07-01 11:05:09+02:00       21.138
2016-07-01 11:05:10+02:00       21.156
2016-07-01 11:05:11+02:00       21.179
2016-07-01 11:05:12+02:00       21.198
2016-07-01 11:05:13+02:00       21.206
2016-07-01 11:05:14+02:00       21.225
2016-07-01 11:05:15+02:00       21.233
+3

Today I applied the method dateutil, but have since switched to a faster alternative:

date_parser = lambda ts: pd.to_datetime([s[:-5] for s in ts]))

Edit: s[:-5]is correct (screenshot has error)

In the screenshot below, I am importing ~ 55 MB of files separated by tabs. The method dateutilworks, but takes several orders of magnitude longer.

enter image description here

For this, pandas 0.18.1 and dateutil 2.5.3 were used.


UPDATE This lambda function will work even if there is Z-0000no suffix ...

date_parser = lambda ts: pd.to_datetime([s[:-5] if 'Z' in s else s for s in ts])
0
source

Source: https://habr.com/ru/post/1648814/


All Articles