Formatting date and time variables gives missing time values โ€‹โ€‹as 00:00:00. Using Python

I am currently using python trying to split a datetime column by 2, one for Date and one at a time, as well as a properly formatted column.

ORIGINAL DATA SITE

INCIDENT_DATE 12/31/2006 11:20:00 PM 12/31/2006 11:30:00 PM 01/01/2007 00:25 01/01/2007 00:10 12/31/2006 11:30:00 AM 01/01/2007 00:05 01/01/2007 00:01 12/31/2006 4:45:00 PM 12/31/2006 11:50:00 PM **01/01/2007** 

* I used 2 codes, one to format the column, and the other to separate it. However, after formatting the column, the missing time values โ€‹โ€‹indicated 00:00:00; the time was indicated here at 12 midnight. Look below

AFTER FORMATION

 2006-12-31 23:20:00 2006-12-31 23:30:00 2007-01-01 00:25:00 2007-01-01 00:10:00 2006-12-31 11:30:00 2007-01-01 00:05:00 2007-01-01 00:01:00 2006-12-31 16:45:00 2006-12-31 23:50:00 **2007-01-01 00:00:00** 

Used codes:

 ## Format datetime column crimeall['INCIDENT_DATE'] = pd.DatetimeIndex(crimeall['INCIDENT_DATE']) ##Split DateTime column crimeall['TIME'],crimeall['DATE']= crimeall['INCIDENT_DATE'].apply(lambda x:x.time()), crimeall['INCIDENT_DATE'].apply(lambda x:x.date()) 

Is it possible to do this without specifying the value of the time set at 00:00:00? Is it possible to have these missing values โ€‹โ€‹written as Nan when formatting datetime?

Any thoughts on how I can get the formatted date and time showing missing time values โ€‹โ€‹like NaN.

WHAT I LOVE TO WATCH LOVE

 2006-12-31 23:20:00 2006-12-31 23:30:00 2007-01-01 00:25:00 2007-01-01 00:10:00 2006-12-31 11:30:00 2007-01-01 00:05:00 2007-01-01 00:01:00 2006-12-31 16:45:00 2006-12-31 23:50:00 **2007-01-01 NaN** 

Hope there is a way to do this.

+6
source share
2 answers

I do not believe that there is any way to have a datetime column that is part real and part NaN. Note that date-time is essentially a format on top of an integer, and an integer cannot be half valid and invalid (a bit more on that below).

Anyway, I would just make a new column for a while, which includes NaN. Starting from the following, where "raw_dt" is your raw data, and "formatted_dt" is the correct time and date:

  raw_dt formatted_dt 0 12/31/2006 11:20:00 PM 2006-12-31 23:20:00 1 12/31/2006 11:30:00 PM 2006-12-31 23:30:00 ... 7 12/31/2006 4:45:00 PM 2006-12-31 16:45:00 8 12/31/2006 11:50:00 PM 2006-12-31 23:50:00 9 01/01/2007 2007-01-01 00:00:00 

I would create a mask, something like this:

 df['valid_time'] = df.raw_dt.str.contains(':') 

which should work well here, and you can use regex if you need something more complex. Then create a new time column.

 df['time'] = df.ix[df['valid_time'],'formatted_dt'].dt.time raw_dt formatted_dt valid_time time 0 12/31/2006 11:20:00 PM 2006-12-31 23:20:00 True 23:20:00 1 12/31/2006 11:30:00 PM 2006-12-31 23:30:00 True 23:30:00 ... 7 12/31/2006 4:45:00 PM 2006-12-31 16:45:00 True 16:45:00 8 12/31/2006 11:50:00 PM 2006-12-31 23:50:00 True 23:50:00 9 01/01/2007 2007-01-01 00:00:00 False NaN 

From there, you can format as you like, for example:

 df.formatted_dt.dt.date.map(str) + df.time.map(str).str.rjust(9) 0 2006-12-31 23:20:00 1 2006-12-31 23:30:00 ... 7 2006-12-31 16:45:00 8 2006-12-31 23:50:00 9 2007-01-01 nan 

To briefly talk about what date-time is, look here and notice that you can do this as a glimpse of what really is (nano-seconds since January 1, 1970):

 df.formatted_dt.astype(np.int64) 0 1167607200000000000 1 1167607800000000000 ... 7 1167583500000000000 8 1167609000000000000 9 1167609600000000000 
0
source

Add ambiguous ='NaT' to pd.DatetimeIndex . If that doesn't work, you can always correct the values โ€‹โ€‹using something like

 crimeall['TIME'] = [np.NaN if t.isoformat()=='00:00:00' else t for t in crimeall['TIME']] 
+1
source

Source: https://habr.com/ru/post/987054/


All Articles