Geom_histogram: wrong baskets?

I use ggplot 2.1.0 to plot histograms and I have unexpected behavior regarding histograms. I will give an example with left-closed cells (that is, [0, 0.1 [) with a bin width of 0.1.

mydf <- data.frame(myvar=c(-1,-0.5,-0.4,-0.1,-0.1,0.05,0.1,0.1,0.25,0.5,1))
myplot <- ggplot(mydf, aes(myvar)) + geom_histogram(aes(y=..count..),binwidth = 0.1, boundary=0.1,closed="left")
myplot
ggplot_build(myplot)$data[[1]]

enter image description here

In this example, we can expect that the value -0.4 is in the cell [-0.4, -0.3 [, but instead (mysteriously) it falls (in a mysterious order) in the cell [-0.5, -0.4 [. The same goes for the value -0.1, which falls into [-0.2, -0.1 [instead of [-0.1.0 [... etc.

Is there something I don’t quite understand (especially with the new parameters "center" and "border")? Or is ggplot2 doing weird things there?

Thanks in advance, Best regards, Arnaud

PS: Also here: https://github.com/hadley/ggplot2/issues/1651

+2
source
1

, -, , . , ggplot2_2.0.0. , , boundary.

df <- data.frame(var = seq(-100,100,10)/100)
as.list(df) # check the data
$var
 [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2
[10] -0.1  0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7
[19]  0.8  0.9  1.0
library("ggplot2")
p <- ggplot(data = df, aes(x = var)) + 
    geom_histogram(aes(y = ..count..), 
        binwidth = 0.1, 
        boundary = 0.1, 
        closed = "left")
p

enter image description here

boundary. 1, 0,99. .

ggplot(data = df, aes(x = var)) + 
    geom_histogram(aes(y = ..count..), 
        binwidth = 0.05, 
        boundary = 0.99, 
        closed = "left")

( )

enter image description here

, . 1 , (. eps). ggplot2 1-7 ( ) 1-8 ( ).

:

ncount:

str(ggplot_build(p)$data[[1]])
##  'data.frame':   20 obs. of  17 variables:
##   $ y       : num  1 1 1 1 1 2 1 1 1 0 ...
##   $ count   : num  1 1 1 1 1 2 1 1 1 0 ...
##   $ x       : num  -0.95 -0.85 -0.75 -0.65 -0.55 -0.45 -0.35 -0.25 -0.15 -0.05 ...
##   $ xmin    : num  -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 ...
##   $ xmax    : num  -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 ...
##   $ density : num  0.476 0.476 0.476 0.476 0.476 ...
##   $ ncount  : num  0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.5 0 ...
##   $ ndensity: num  1.05 1.05 1.05 1.05 1.05 2.1 1.05 1.05 1.05 0 ...
##   $ PANEL   : int  1 1 1 1 1 1 1 1 1 1 ...
##   $ group   : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##   $ ymin    : num  0 0 0 0 0 0 0 0 0 0 ...
##   $ ymax    : num  1 1 1 1 1 2 1 1 1 0 ...
##   $ colour  : logi  NA NA NA NA NA NA ...
##   $ fill    : chr  "grey35" "grey35" "grey35" "grey35" ...
##   $ size    : num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##   $ linetype: num  1 1 1 1 1 1 1 1 1 1 ...
##   $ alpha   : logi  NA NA NA NA NA NA ...

ggplot_build(p)$data[[1]]$ncount
##  [1] 0.5 0.5 0.5 0.5 0.5 1.0 0.5 0.5 0.5 0.0 1.0 0.5
## [13] 0.5 0.5 0.0 1.0 0.5 0.0 1.0 0.5

?

:

    df <- data.frame(var = as.integer(seq(-100,100,10)))
# eps <- 1.000000000000001 # on my system
eps <- 1+10*.Machine$double.eps
p <- ggplot(data = df, aes(x = eps*var/100)) + 
    geom_histogram(aes(y = ..count..), 
                   binwidth = 0.05, 
                   closed = "left")
p

( boundary)

enter image description here

ggplot2_1.0.1. , . bin.R stat-bin.r https://github.com/hadley/ggplot2/blob/master/R, count bin_vector(), :

bin_vector <- function(x, bins, weight = NULL, pad = FALSE) {
 ... STUFF HERE I HAVE DELETED FOR CLARITY ...
cut(x, bins$breaks, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
}

, ... ...

"patching" bin_vector :

  • bins$fuzzy

  • - bins$breaks, ( , ), bins$fuzzy .

  • bins$breaks bins$fuzzy bin_vector, . , , , , , ggplot2.

  • bin_vector , bins$breaks, bins$fuzzy. , .

Patching

"patch" bin_vector, github , , , :

 ggplot2:::bin_vector

( ) :

library("ggplot2")
bin_vector <- function (x, bins, weight = NULL, pad = FALSE) 
{
... STUFF HERE I HAVE DELETED FOR CLARITY ...
## MY PATCH: Replace bins$breaks with bins$fuzzy
bin_idx <- cut(x, bins$fuzzy, right = bins$right_closed,
include.lowest = TRUE)
... STUFF HERE I HAVE DELETED FOR CLARITY ...
ggplot2:::bin_out(bin_count, bin_x, bin_widths)
## THIS IS THE PATCHED FUNCTION
}
assignInNamespace("bin_vector", bin_vector, ns = "ggplot2")
df <- data.frame(var = seq(-100,100,10)/100)
ggplot(data = df, aes(x = var)) + geom_histogram(aes(y = ..count..), binwidth = 0.05, boundary = 1, closed = "left")

, : , , . , R detach ggplot2.

2.0.9.3 2.1.0.1 2.2.0.1 (, , 2.2.0.0), ).

, ggplot2_0.9.3, ( ), ggplot2093:

URL <- "http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.9.3.tar.gz" 
install.packages(URL, repos = NULL, type = "source", 
    lib = "~/R/testing/ggplot2093") 

, :

library("ggplot2", lib.loc = "~/R/testing/ggplot2093") 
+4

Source: https://habr.com/ru/post/1694709/


All Articles