Search for columns containing dates in Pandas

I am trying to define columns containing dates as strings, then convert them to a better type (DateTime or something as numeric as UTC). The date format used is 27/11/2012 09:17 , which I can find with the regular expression \d{2}/\d{2}/\d{4} \d{2}:\d{2} .

My current code is:

 date_cols = [] df = cleaned_data date_pattern = re.compile('\d{2}/\d{2}/\d{4} \d{2}:\d{2}') for column in df: if date_pattern.search(str(item)): date_cols += [column] return date_cols 

I am sure that this does not use the capabilities of pandas . Is there a better way to identify columns or directly convert them to DateTime or UTC timestamps?

+4
source share
2 answers

If you want to convert whole columns, you can use convert_objects:

 df.convert_objects(convert_dates=True) 

To extract the dates contained in columns / series you can use findall:

 In [11]: s = pd.Series(['1', '10/11/2011 11:11']) In [12]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}') Out[12]: 0 [] 1 [10/11/2011 11:11] dtype: object In [13]: s.str.findall('\d{2}/\d{2}/\d{4} \d{2}:\d{2}').apply(pd.Series) Out[13]: 0 0 NaN 1 10/11/2011 11:11 

* and then convert to Timestamps using convert_objects ... *

+5
source

Depending on how excessive you are, you will force to_datetime everything that he thinks is datetime in datetime, including ints → datetimes (default is ns from the UNIX era).

to_datetime gives you a lot of control over how to interpret the dates that it finds too.

 pandas.to_datetime(arg, errors='ignore', dayfirst=False, utc=None, box=True, format=None, coerce=False, unit='ns') 
+3
source

Source: https://habr.com/ru/post/1501968/


All Articles