Ggplot2: how to align histogram columns with x axis?

Consider this simple example.

library(ggplot2) dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15)) ggplot(dat, aes(x = number)) + geom_histogram() 

enter image description here

See how rows are aligned with the x axis? Why is the first bar to the left of 5.0 , and the panel at 10.0 centered? How can I get control over this? For example, it would make more sense for me to have a panel starting to the right of the label.

Thanks!

+5
source share
2 answers

This centers the bar on the value.

 data <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15)) ggplot(data,aes(x = number)) + geom_histogram(binwidth = 0.5) 

Here is a trick with a tag label to align the bar to the left. But if you add other data, you need to move it as well.

 ggplot(data,aes(x = number)) + geom_histogram(binwidth = 0.5) + scale_x_continuous( breaks=seq(0.75,15.75,1), #show x-ticks align on the bar (0.25 before the value, half of the binwidth) labels = 1:16 #change tick label to get the bar x-value ) 

another option: binwidth = 1, breaks=seq(0.5,15.5,1) (may make more sense for the whole)

+3
source

Why are strings “strangely aligned”?

Let me start by explaining why your code leads to strangely aligned bars. This is due to the way the histogram is built. First, the x axis is divided into intervals, and then the number of values ​​in each interval is calculated.

By default, ggplot splits data into 30 bins. He even spills out a message that says:

stat_bin() with bins = 30 . Choose the best value with binwidth .

The default value is not always a good choice. In your case, when all data points are integers, you can select the boundaries of the bins as 5, 6, 7, 8, ... or 4.5, 5.5, 6.5, ... , so that each bit contains exactly one integer value. You can get the boundaries of the bins that were used on the chart as follows:

 data <- data.frame(number = c(5, 10, 11 ,12, 12, 12, 13, 15, 15)) p <- ggplot(data, aes(x = number)) + geom_histogram() ggplot_build(p)$data[[1]]$xmin ## [1] 4.655172 5.000000 5.344828 5.689655 6.034483 6.379310 6.724138 7.068966 7.413793 ## [10] 7.758621 8.103448 8.448276 8.793103 9.137931 9.482759 9.827586 10.172414 10.517241 ## [19] 10.862069 11.206897 11.551724 11.896552 12.241379 12.586207 12.931034 13.275862 13.620690 ## [28] 13.965517 14.310345 14.655172 

As you can see, the boundaries of the bins are not selected in such a way as to lead to a good alignment of the columns with integers.

So, in short, the reason for the strange alignment is that ggplot just uses the default number of 30 mailboxes, which in your case is not suitable to have bars that align well with integers.

There are (at least) two ways to get well-aligned columns, which I will cover in the following

Use a bar instead

Since you have integer data, a histogram may simply not be a suitable visualization choice. Instead, you can use geom_bar() , which will result in integer-centered bars:

 ggplot(data, aes(x = number)) + geom_bar() + scale_x_continuous(breaks = 1:16) 

enter image description here

You can move the columns to the right of integers by adding 0.5 to number :

 ggplot(data, aes(x = number + 0.5)) + geom_bar() + scale_x_continuous(breaks = 1:16) 

enter image description here

Create a histogram with the appropriate cells

If you still want to use the histogram, you can force ggplot to use more reasonable cells as follows:

 ggplot(data, aes(x = number)) + geom_histogram(binwidth = 1, boundary = 0, closed = "left") + scale_x_continuous(breaks = 1:16) 

enter image description here

With binwidth = 1 you redefine the selection of 30 default boxes and explicitly require that the cells have a width of 1. boundary = 0 ensures that bitting starts with the integer value that you need if you want integers to the left of the bars. (If you omit it, the bins are selected so that the bars are centered on integers.)

The closed = "left" argument is a little more difficult to explain. As I described above, the boundaries of the bins are now selected equal to 5, 6, 7, ... The question is, in which bunker, for example, 6? It can be either the first or the second. This is a choice that is controlled by closed : if you set it to "right" (the default), then the cells will be closed on the right, which means that the right border of the cell will be turned on, and the left border belongs to the basket on the left. So 6 will be in the first drawer. On the other hand, if you select "left" , the left border will be part of the hopper, and 6 in the second drawer.

Since you want the bars to the left of the integers, you need to select closed = "left" .

Comparison of two solutions

If you compare the histogram with the graph, you will notice two differences:

  • There is a small gap between the strips in the dashed section, while they touch the histogram. You can make strokes in the first using geom_bar(width = 1) .
  • The rightmost bar is from 15 to 16 for the bar chart, and for the histogram it is from 14 to 15. The reason is that although for all the boxes only the left border is part of the hopper, for the largest bin both restrictions are on.
+6
source

Source: https://habr.com/ru/post/1262360/


All Articles