Reading timestamp data in R from multiple time zones

I have a character format timestamp column that looks like this:

2015-09-24 06:00:00 UTC

2015-09-24 05:00:00 UTC

dateTimeZone <- c("2015-09-24 06:00:00 UTC","2015-09-24 05:00:00 UTC") 

I would like to convert this character data to temporary data using POSIXct, and if I knew that all the timestamps were in UTC, I would do it like this:

 dateTimeZone <- asPOSIXct(dateTimeZone, tz="UTC") 

However, I don’t necessarily know that all timestamps are in UTC, so I tried

 dateTimeZone <- asPOSIXct(dateTimeZodateTimeZone, format = "%Y-%m-%d %H:%M:%S %Z") 

However, since strptime only supports% Z for output, this returns the following error:

Error in strptime (x, format, tz = tz): using% Z for input is not supported

I checked the documentation for the lubridate package, and I could not see that it handled this problem other than POSIXct.

Is my only option to check the time zone of each row and then use the appropriate time zone with something like the following?

 temp[grepl("UTC",datetimezone)] <- as.POSIXct(datetimezone, tz="UTC") temp[grepl("PDT",datetimezone)] <- as.POSIXct(datetimezone, tz="America/Los_Angeles") 
+6
source share
3 answers

You can get there by checking each line and processing it accordingly, and then returning everything back to the agreed UTC time. (Now assigned to shorten time zone abbreviations to full time zone specifications)

 dates <- c( "2015-09-24 06:00:00 UTC", "2015-09-24 05:00:00 PDT" ) #extract timezone from dates datestz <- vapply(strsplit(dates," "), tail, 1, FUN.VALUE="") ## Make a master list of abbreviation to ## full timezone names. Used an arbitrary summer ## and winter date to try to catch daylight savings timezones. tzabbrev <- vapply( OlsonNames(), function(x) c( format(as.POSIXct("2000-01-01",tz=x),"%Z"), format(as.POSIXct("2000-07-01",tz=x),"%Z") ), FUN.VALUE=character(2) ) tmp <- data.frame(Olson=OlsonNames(), t(tzabbrev), stringsAsFactors=FALSE) final <- unique(data.frame(tmp[1], abbrev=unlist(tmp[-1]))) ## Do the matching: out <- Map(as.POSIXct, dates, tz=final$Olson[match(datestz,final$abbrev)]) as.POSIXct(unlist(out), origin="1970-01-01", tz="UTC") # 2015-09-24 06:00:00 UTC 2015-09-24 05:00:00 PDT #"2015-09-24 06:00:00 GMT" "2015-09-24 12:00:00 GMT" 
+4
source

A data.table solution:

 library(data.table) data <- data.table(dateTimeZone=c("2015-09-24 06:00:00 UTC", "2015-09-24 05:00:00 America/Los_Angeles")) data[, timezone:=tstrsplit(dateTimeZone, split=" ")[[3]]] data[, datetime.local:=as.POSIXct(dateTimeZone, tz=timezone), by=timezone] data[, datetime.utc:=format(datetime.local, tz="UTC")] 

The main thing is to split the data into the time zone field so that you can combine each set of time zones before as.POSIXct separately (I'm not quite sure why as.POSIXct will not let you actually give it a vector). Here I use the efficient syntax split-apply-comb data.table , but you can apply the same general idea with R base or with dplyr .

+1
source

Another way to use lubridate ...

 library(stringr) library(lubridate) normalize.timezone <- function(dates, target_tz = local.timezone) { tzones <- str_split(dates, ' ') tzones <- lapply(tzones, '[', 3) tzones <- unlist(tzones) dts <- str_replace_all(dates, ' [\\w\\-\\/\\+]+$', '') tmp <- lapply(1:length(dates), function(i) { with_tz(as.POSIXct(dts[ i ], tz = tzones[ i ]), target_tz) }) final <- unlist(tmp) attributes(final) <- attributes(tmp[[ 1 ]]) final } dates <- c('2019-01-06 23:00:00 MST', '2019-01-22 14:00:00 America/Los_Angeles', '2019-01-05 UTC-4', '2019-01-15 15:00:00 Europe/Moscow') (normalize.timezone(dates, 'EST')) 
0
source

Source: https://habr.com/ru/post/1232661/


All Articles