R with millisecond rounding

Given the following problem with rounding milliseconds under R. How do I get around this so that the time is correct?

> options(digits.secs=3) > as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.060 UTC" > as.POSIXlt("13:29:56.062", format='%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.061 UTC" > as.POSIXlt("13:29:56.063", format='%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.063 UTC" 

I noticed that this URL contains background information, but does not solve my problem: Milliseconds of a puzzle when strptime is called in R.

Also this URL affects the problem, but does not solve it: R xts: .001 milliseconds in the index .

In these cases, I see the following:

 > x <- as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC') > print(as.numeric(x), digits=20) [1] 1339075796.0610001087 

The url also indicates that this is just a display problem, but I noticed that using operators like "%OS3" without a parameter string does not seem to pick up the correct number of digits.

The version used is 32 bit 2.15.0 under Windows, but it seems to exist in other situations for R.

Please note that my source data is date time strings in the CSV file, I have to find a way to convert them to the correct millisecond time from the string.

+5
source share
4 answers

I do not see it:

 > options(digits.secs = 4) > as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.061 UTC" > as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.062 UTC" > as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.063 UTC" > options(digits.secs = 3) > as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.061 UTC" > as.POSIXlt("13:29:56.062", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.062 UTC" > as.POSIXlt("13:29:56.063", format = '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.063 UTC" 

from

 > sessionInfo() R version 2.15.0 Patched (2012-04-14 r59019) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base 

With lines of the format "%OSn" , one of them truncates. If a fractional second cannot be represented exactly at floating points, then truncation can go very wrong. If you see that everything is going wrong, you can also explicitly round to the block you need or add half the fraction in which you want to work (in the shown case 0.0005 ):

 > t1 <- as.POSIXlt("13:29:56.061", format = '%H:%M:%OS', tz='UTC') > t1 [1] "2012-06-07 13:29:56.061 UTC" > t1 + 0.0005 [1] "2012-06-07 13:29:56.061 UTC" 

(but I said: I do not see a problem here.)

This last moment was made by Simon Urbanek on the R-Devel mailing list on May 30, 2012.

+5
source

This is the same problem as the Millisecond puzzles when calling strptime in R.

Your example:

 > x <- as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC') > print(as.numeric(x), digits=20) [1] 1339075796.0610001087 

no problem. as.numeric(x) converts your POSIXlt object to POSIXct before converting to numeric, so you get various floating-point rounding errors.

This is not how print.POSIXlt (which calls format.POSIXlt ) format.POSIXlt . format.POSIXlt formats each element of the POSIXlt list POSIXlt individually, so you will need to look:

 print(x$sec, digits=20) [1] 56.060999999999999943 

And that number is truncated in the third decimal place, so you see 56.060 . You can see this by directly calling format :

 > format(x, "%H:%M:%OS6") [1] "13:29:56.060999" 
+3
source

There are milliseconds here:

  unclass(as.POSIXlt("13:29:56.061", '%H:%M:%OS', tz='UTC')) $sec [1] 56.061 ... 

(There is no need to call the format, this argument name is not required to enter any other function).

Otherwise, I can not reproduce (on Windows 64-bit R 2.15.0):

 options(digits.secs = 3) as.POSIXlt("13:29:56.061", '%H:%M:%OS', tz='UTC') [1] "2012-06-07 13:29:56.061 UTC" sessionInfo() R version 2.15.0 Patched (2012-05-05 r59321) Platform: x86_64-pc-mingw32/x64 (64-bit) ... 
+1
source

In testing, I noted that this problem still exists for 32bit R 3.01 and that this is due to truncation of floating-point data, which is typical for 32-bit implementation of print statements, format and as.character for POSIXlt.

The underlying data was not saved in another type, which causes truncation in one case (32 bits) and not another (64 bits), but “print”, “format” and “as.character” are functions for the POSIXlt type, which used to display POSIXlt data as a displayed string.

While the documented behavior is that these functions truncate (ignore) extra digits (as @Gavin Simpson mentioned), this is not so true for 32-bit and 64-bit versions. To demonstrate; we will generate 1000 times and perform some comparison operations:

 > options(digits.sec=3) > x = as.POSIXlt("13:29:56.061", format='%H:%M:%OS', tz='UTC') > for (i in 0:999) { > x[i+1] = as.POSIXlt(paste0("13:29:56.",sprintf("%03d",i)),format='%H:%M:%OS',tz='UTC') > } > sum(x[2:1000]>x[1:999]) [1] 999 

In both 32-bit and 64-bit comparison operators are consistent, however under 32 bit I see:

 > x[1:6] [1] "2015-10-16 13:29:56.000 UTC" "2015-10-16 13:29:56.000 UTC" [3] "2015-10-16 13:29:56.002 UTC" "2015-10-16 13:29:56.003 UTC" [5] "2015-10-16 13:29:56.003 UTC" "2015-10-16 13:29:56.005 UTC" 

So this is clearly a display issue. Looking at the actual numbers in the POSIXlt data type, especially in seconds, we can see what happens:

 > y = (x[1:6]$sec) > y [1] 56.000 56.001 56.002 56.003 56.004 56.005 > trunc(y*1000)/1000 [1] 56.000 56.001 56.002 56.003 56.004 56.005 > trunc((y-floor(y))*1000)/1000 [1] 0.000 0.000 0.002 0.003 0.003 0.005 

I would suggest that this is a bug that needs to be fixed in the base base library, but as a temporary fix you can rewrite the functions "print", "as.character" and "format" to change the output to your desired output, eg

 format.POSIXlt = function(posix) { return(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ", sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec))) } print.POSIXlt = function(posix) { print(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ", sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec))) } as.character.POSIXlt = function(posix) { return(paste0(posix$year+1900,"-",sprintf("%02d",posix$mon+1),"-",sprintf("%02d",posix$mday)," ", sprintf("%02d",posix$hour),":",sprintf("%02d",posix$min),":",sprintf("%002.003f",posix$sec))) } 
+1
source

Source: https://habr.com/ru/post/906173/


All Articles