Summary function Rounding Error in r?

I have a data frame from line 16968 (reasons for accuracy are listed below). I check if the current variable (data $ Ob) works on each line in sequential order (the data of the first line $ Ob is 1 ... the last data of the line $ Ob is 16968 and for each line between them.

When I launched the summary ($ Ob data), it tells me that the maximum 16970 is not 16968. When I started max ($ Ob data), it says that the maximum is 16968, not the value from the summary.

I checked the for-loop to check every observation, and it looks like the max () function is correct and the $ Ob variable of the variable does what it should. But does anyone know why the summary function is disabled by 2? I am assuming a rounding error (somehow?), But this data validation is crucial for the analysis I am doing, and if it is wrong, then my subsequent analysis will be a bunk.

Here I ran for the cycle, but I do not think it is important for this issue.

checker <- vector(length=nrow(rd)) na.checker <- vector(length=nrow(rd)) for (i in 1:nrow(rd)){ checker[i] <- ifelse(i==rd$Ob[i], 1, 0) na.checker[i] <- ifelse(is.na(rd$Ob[i])==TRUE,0,1) } sum(checker) 

Thanks.

+4
source share
1 answer

It’s hard to say without a reproducible example, but it smells like the mother of all frequently asked questions: the default display accuracy is four digits, so 16968 is rounded to 16970.

Edit: we need your data for an example, because with a naive example, I cannot reproduce this:

 R> set.seed(42) R> df <- data.frame(a=as.numeric(1:16968), b=16968:1, + c=rnorm(16968), d=runif(16968)) R> summary(df) abcd Min. : 1 Min. : 1 Min. :-4.04328 Min. :0.000101 1st Qu.: 4243 1st Qu.: 4243 1st Qu.:-0.68271 1st Qu.:0.252515 Median : 8484 Median : 8484 Median :-0.00528 Median :0.505090 Mean : 8484 Mean : 8484 Mean :-0.00834 Mean :0.504563 3rd Qu.:12726 3rd Qu.:12726 3rd Qu.: 0.66746 3rd Qu.:0.758991 Max. :16968 Max. :16968 Max. : 4.32809 Max. :0.999976 

Change 2, from h / t to @ SimonO101:

 R> summary(df$a) ## what OP saw Min. 1st Qu. Median Mean 3rd Qu. Max. 1 4240 8480 8480 12700 17000 R> summary(df$a, digits=6) ## what OP wanted to see Min. 1st Qu. Median Mean 3rd Qu. Max. 1 4243 8484 8484 12726 16968 R> 
+9
source

Source: https://habr.com/ru/post/1479180/


All Articles