Convert JSON timestamp string to python date in pandas dataframe

I have a pandas framework that I read from JSON, one date column is a weird timestamp format, as it should

"/ Date (1405961743000 + 0100) /"

. How to convert entire column to python date?

I was able to manually convert this date to python date using the datetime fromtimestamp function for the first 10 digits, i.e. datetime.datetime.fromtimestamp(1405961743) , but I'm struggling to convert the entire column.

I assume that I need to select the corresponding digits from each record, convert to an integer, and then use the fromtimestamp function, but I'm new to python (and pandas), so I try to do this.

Any help would be appreciated.

thanks

0
source share
2 answers

Obviously, it would be better if you knew where JSON came from, and you can look in the docs / ask the author / etc. to find out what the actual intent behind this date format is. (It can even be generated by Python code using a library that you can simply use on your own ...)

But looking at the numbers, I can pretty well guess what this means: 1405961743000 is milliseconds from the Unix era (which explains why you can use the first 10 digits as seconds since Unix, at least in a fairly wide range around 2014 years), and +0100 - the time zone offset from GMT, in the +HHMM format.

So, instead of extracting the first 10 digits, converting to int and calling fromtimestamp , you need to extract everything up to + or - , convert to int, divide by 1000 and call fromtimestamp . Although the fact that the only example you gave us has 0 milliseconds implies that they will have a good chance, in which case this difference will not matter ...

In any case, it is up to you what to do with the time zone offset. Do you want to store local date information? GMT datetimes? naive local dates? It’s very easy for them to get from the timestamp and the offset (although “conscious” will mean using a fake time zone, such as GMT-05: 00, which, of course, does not have any historical or DST information), but you have to decide which of them you need.


Whatever you do, you might consider expanding your JSON decoder to automate it, as shown in the examples in the docs . (Any string that matches the regular expression r'/Date\((\d+)([+-]\d{4})\)/' , the first group is the timestamp, and the second is the offset.)

But maybe not. Especially since parse_string does not seem excessive, at least with 3.4, so it seems you will have to defuse it. See this code. I hit together as a proof of concept; you can do it a little better, but there is a limit to how clean you can do it if they don't provide it with a hook ...


PS, if you ever distribute JSON yourself, you can consider a more standardized and self-documenting way to do this. The dict format shown in the json module docs, where you actually specify the constructor to call and the arguments to pass it, is a lot easier for people to figure out (and add a hook for). Or, alternatively, there is a quasi-standard way of encoding YAML formats as JSON formats, and YAML is extensible (and already has a standard temporary extension).

+1
source

Timeline OData version 2 JSON detailed format for Datetime :

"/Date(<ticks>["+" | "-" <offset>])/"
<ticks> = number of milliseconds since midnight January 1, 1970
<offset> = number of minutes to add or subtract

As @Matt Johnson mentions , the format can be seen in ASP.NET or WCF applications.

 #!/usr/bin/env python3 import re from datetime import datetime, timedelta, timezone time_string = "/Date(1405961743000+0100)/" epoch = datetime(1970, 1, 1, tzinfo=timezone.utc) ticks, offset = re.match(r'/Date\((\d+)([+-]\d{4})?\)/$', time_string).groups() utc_dt = epoch + timedelta(milliseconds=int(ticks)) print(utc_dt, utc_dt.strftime('%Z')) if offset: offset = int(offset) # http://www.odata.org/documentation/odata-version-2-0/json-format # says offset is minutes (an error?) dt = utc_dt.astimezone(timezone(timedelta(minutes=offset))) print(dt, dt.strftime('%Z')) # but it looks like it could be HHMM hours, minutes = divmod(abs(offset), 100) if offset < 0: hours, minutes = -hours, -minutes dt = utc_dt.astimezone(timezone(timedelta(hours=hours, minutes=minutes))) print(dt, dt.strftime('%Z')) 

Output

 2014-07-21 16:55:43+00:00 UTC+00:00 2014-07-21 18:35:43+01:40 UTC+01:40 2014-07-21 17:55:43+01:00 UTC+01:00 

It seems that odata.org documents should be ignored, and the offset should be considered as an HHMM format.

0
source

Source: https://habr.com/ru/post/982391/


All Articles