Like R POSIXct formats with fractional seconds

I believe that R incorrectly formats POSIXct types with fractional seconds. I sent this via R-bugs as a promotion request and received a scrub: "we believe that the current behavior is correct - the error has been removed." Although I am very grateful for the work that they have done and continue to do, I wanted other people to take on this particular problem and perhaps advised how to make this point more effective.

Here is an example:

> tt <- as.POSIXct('2011-10-11 07:49:36.3') > strftime(tt,'%Y-%m-%d %H:%M:%OS1') [1] "2011-10-11 07:49:36.2" 

That is, tt is created as POSIXct time with a fractional part of .3 seconds. When it is printed with a decimal digit, the indicated value is .2. I work a lot with timestamps of millisecond accuracy, and this causes me a lot of headaches, which often print one step lower than the actual value.

Here's what happens: POSIXct is the floating point number of seconds since the era. All integer values ​​are processed exactly, but in base-2 floating point, the closest value to .3 is very slightly less .3. The specified behavior of strftime() for the %OSn format is rounding to the requested number of decimal digits, so the displayed result is .2. For other fractional parts, the floating point value slightly exceeds the entered value, and the display gives the expected result:

  > tt <- as.POSIXct('2011-10-11 07:49:36.4') > strftime(tt,'%Y-%m-%d %H:%M:%OS1') [1] "2011-10-11 07:49:36.4" 

The argument of the developers is that for time types, we should always round to the required accuracy. For example, if the time is 11: 59: 59.8, then printing it with the format %H:%M should give "11:59" not "12:00", but %H:%M:%S should indicate "11: 59:59 "not" 12: 00: 00 ". I agree with this for integer seconds and for a %S format flag, but I think the behavior should be different for format flags that are for fractional parts of seconds. I would like %OSn apply rounded to the closest behavior even for n = 0 , and %S use rounding, so printing 11: 59: 59.8 with the format %H:%M:%OS0 would give "12:00:00 " This would not affect the integer numbers of seconds because they are always represented accurately, but it would more naturally handle rounding errors for fractional seconds.

This prints fractional parts, for example C, because rounding off integers:

  double x = 9.97; printf("%d\n",(int) x); // 9 printf("%.0f\n",x); // 10 printf("%.1f\n",x); // 10.0 printf("%.2f\n",x); // 9.97 

I quickly looked at how fractional seconds are processed in other languages ​​and in different environments, and there seems to be no consensus. Most designs are designed for whole numbers of seconds, and fractional parts are an afterthought. It seems to me that in this case, the R developers made a choice that is not entirely unfounded, but actually not the best, and is not consistent with agreements in other places for displaying floating point numbers.

What are people's thoughts? Is the behavior of R correct? So you developed it yourself?

+44
r posixct
Oct 11 '11 at 12:29
source share
2 answers

One of the main problems is that the POSIXct view is less accurate than the POSIXlt view, and the POSIXct view is converted to the POSIXct view before formatting. Below we see that if our string is converted directly to the POSIXlt view, it is displayed correctly.

 > as.POSIXct('2011-10-11 07:49:36.3') [1] "2011-10-11 07:49:36.2 CDT" > as.POSIXlt('2011-10-11 07:49:36.3') [1] "2011-10-11 07:49:36.3" 

We also see that by looking at the difference between the binary representation of the two formats and the normal representation of 0.3.

 > t1 <- as.POSIXct('2011-10-11 07:49:36.3') > as.numeric(t1 - round(unclass(t1))) - 0.3 [1] -4.768372e-08 > t2 <- as.POSIXlt('2011-10-11 07:49:36.3') > as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3 [1] -2.831069e-15 

It is interesting that both representations are actually smaller than the usual representation of 0.3, but the second is either close enough or truncates not as I imagine. Given this, I will not worry about the difficulties with the idea of ​​floating points; they can still happen, but if we are careful about what kind of representation we use, they hopefully will be minimized.

Robert's desire for a rounded output is just a release issue, and it could be solved in any number of ways. My suggestion would be something like this:

 myformat.POSIXct <- function(x, digits=0) { x2 <- round(unclass(x), digits) attributes(x2) <- attributes(x) x <- as.POSIXlt(x2) x$sec <- round(x$sec, digits) format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep="")) } 

It starts with entering POSIXct and the first rounds to the desired numbers; it is then converted to POSIXlt and rounded again. The first rounding ensures that all units increase appropriately when we are at the border of the minute / hour / day; the second round of rounding after conversion to a more accurate representation.

 > options(digits.secs=1) > t1 <- as.POSIXct('2011-10-11 07:49:36.3') > format(t1) [1] "2011-10-11 07:49:36.2" > myformat.POSIXct(t1,1) [1] "2011-10-11 07:49:36.3" > t2 <- as.POSIXct('2011-10-11 23:59:59.999') > format(t2) [1] "2011-10-11 23:59:59.9" > myformat.POSIXct(t2,0) [1] "2011-10-12 00:00:00" > myformat.POSIXct(t2,1) [1] "2011-10-12 00:00:00.0" 

The final version: did you know that the standard allows you to use up to two seconds of jump?

 > as.POSIXlt('2011-10-11 23:59:60.9') [1] "2011-10-11 23:59:60.9" 

OK, one more thing. The behavior actually changed in May due to an error filed by the OP ( Error 14579 ); before that he was making round fractional seconds. Unfortunately, this meant that sometimes it could take up to a second, which was impossible; in the bug report he rose to 60 when he was supposed to roll over the next minute. One of the reasons why it was decided to crop instead of the round is that it prints from the POSIXlt view, where each block is stored separately. Thus, the transition to the next minute / hour / etc. More complicated than just direct rounding. To easily round, you need to round in the POSIXct view and then convert back, as I suggest.

+30
Oct 11 '11 at 18:33
source share

I ran into this problem and started looking for a solution. @Aaron's answer is good, but still breaking for big dates.

Here is a code that correctly rounds seconds, according to format or option("digits.secs") :

 form <- function(x, format = "", tz= "", ...) { # From format.POSIXct if (!inherits(x, "POSIXct")) stop("wrong class") if (missing(tz) && !is.null(tzone <- attr(x, "tzone"))) tz <- tzone # Find the number of digits required based on the format string if (length(format) > 1) stop("length(format) > 1 not supported") m <- gregexpr("%OS[[:digit:]]?", format)[[1]] l <- attr(m, "match.length") if (l == 4) { d <- as.integer(substring(format, l+m-1, l+m-1)) } else { d <- unlist(options("digits.secs")) if (is.null(d)) { d <- 0 } } secs.since.origin <- unclass(x) # Seconds since origin secs <- round(secs.since.origin %% 60, d) # Seconds within the minute mins <- floor(secs.since.origin / 60) # Minutes since origin # Fix up overflow on seconds if (secs >= 60) { secs <- secs - 60 mins <- mins + 1 } # Represents the prior minute lt <- as.POSIXlt(60 * mins, tz=tz, origin=ISOdatetime(1970,1,1,0,0,0,tz="GMT")); lt$sec <- secs + 10^(-d-1) # Add in the seconds, plus a fudge factor. format.POSIXlt(as.POSIXlt(lt), format, ...) } 

The fudge factor 10 ^ (-d-1) is from here: Exact conversion from character> POSIXct-> to submillisecond datetimes by Aaron.

Some examples:

 f <- "%Y-%m-%d %H:%M:%OS" f3 <- "%Y-%m-%d %H:%M:%OS3" f6 <- "%Y-%m-%d %H:%M:%OS6" 

From an almost identical question:

 x <- as.POSIXct("2012-12-14 15:42:04.577895") > format(x, f6) [1] "2012-12-14 15:42:04.577894" > form(x, f6) [1] "2012-12-14 15:42:04.577895" > myformat.POSIXct(x, 6) [1] "2012-12-14 15:42:04.577895" 

Above:

 > format(t1) [1] "2011-10-11 07:49:36.2" > myformat.POSIXct(t1,1) [1] "2011-10-11 07:49:36.3" > form(t1) [1] "2011-10-11 07:49:36.3" > format(t2) [1] "2011-10-11 23:59:59.9" > myformat.POSIXct(t2,0) [1] "2011-10-12 00:00:00" > myformat.POSIXct(t2,1) [1] "2011-10-12 00:00:00.0" > form(t2) [1] "2011-10-12" > form(t2, f) [1] "2011-10-12 00:00:00.0" 

The real fun comes in 2038 for some dates. I guess this is because we are losing another precision in the mantissa. Pay attention to the value of the seconds field.

 > t3 <- as.POSIXct('2038-12-14 15:42:04.577895') > format(t3) [1] "2038-12-14 15:42:05.5" > myformat.POSIXct(t3, 1) [1] "2038-12-14 15:42:05.6" > form(t3) [1] "2038-12-14 15:42:04.6" 

This code seems to work for other cases I tried. A common thing between format.POSIXct and myformat.POSIXct in Aaron's answer is converting to from POSIXct to POSIXlt with a stored second field.

This indicates an error in this conversion. I do not use data inaccessible to as.POSIXlt() .

Update

The error in src/main/datetime.c:434 in the static function localtime0 , but I'm still not sure of the correct fix:

Lines 433-434:

 day = (int) floor(d/86400.0); left = (int) (d - day * 86400.0 + 0.5); 

An additional 0.5 for rounding off the value is the culprit. Note that the t3 value below is greater than .5. localtime0 deals only with seconds, and subseconds are added after localtime0 returns.

localtime0 returns correct results if the double representation is an integer value.

+16
Feb 06 '13 at 3:16
source share



All Articles