What makes the right parameter when creating a histogram in R?

I am trying to figure out what the correct parameter does in the hist function in R. Unfortunately, the documentation is unclear for someone who does not have a deep understanding of statistics like me.

Online Documentation:

correct logical; if TRUE, the histogram cells are closed (left open) intervals.

What does it mean to be properly closed (or left) intervals?

+4
source share
3 answers

When creating histograms of non-categorical data (such as pH, temperature, etc.) you need to specify things called β€œbins”. Each bean has something called interval specified for it. For example, if I have data:

11 12 13 14 15 16 17 18 19 

I can create 5 drawers with right open, left closed intervals as follows:

 1st bin: [10, 12) 2nd bin: [12, 14) 3rd bin: [14, 16) 4th bin: [16, 18) 5th bin: [18, 20) 

This means that the first bit will β€œhold” values ​​between 10 and 12, including 10, but not including 12. The interval notation used above is a shorthand for this:

 1st bin: 10 ≀ x < 12 2nd bin: 12 ≀ x < 14 3rd bin: 14 ≀ x < 16 4th bin: 16 ≀ x < 18 5th bin: 18 ≀ x < 20 

So this means that the values ​​11 will go into the 1st bit, but the value 12 will go into the second bit, etc. R will perform this binning process and then draw a histogram depending on the number of elements in each box. For the above data, you will get a not interesting (or interesting, depending on your expectations) histogram that is mostly flat except for the first cell.

The following examples show what the various combinations of brackets and parentheses mean when using interval notation (suppose x is an element of a string of real numbers):

 (1, 4) --> 1 < x < 4 left-open, right-open [3, 7) --> 3 ≀ x < 7 left-closed, right-open (2, 9] --> 2 < x ≀ 9 left-open, right-closed [5, 6] --> 5 ≀ x ≀ 6 left-closed, right-closed 

Note that you cannot use parentheses for infinity, assuming that you are not using the extended string of the real number

 (-∞, ∞) --> -∞ < x < ∞ (-∞, 20] --> -∞ < x ≀ 20 [20, ∞) --> 20 ≀ x < ∞ (1000, ∞) --> 1000 < x < ∞ (-∞, ∞] --> Invalid (41, ∞] --> Invalid 

If I need left, right-open intervals, then the cells will look like this:

 1st bin: (10, 12] ie 10 < x ≀ 12 2nd bin: (12, 14] 12 < x ≀ 14 3rd bin: (14, 16] 14 < x ≀ 16 4th bin: (16, 18] 16 < x ≀ 18 5th bin: (18, 20] 18 < x ≀ 20 

See the difference? In this case, now the values ​​11 and 12 will go into the first bit. This may change when the histogram appears, depending on how you load the data. Now, this time your histogram is still almost flat, but now the fifth bit is different from the rest (only 1 data point instead of 2 for the rest).

Now, fortunately, in R you do not need to specify bins yourself, but R is good enough to ask you if you want the boxes to be closed on the left, open open ( [a, b) ) or open on the left, closed on the right ( (a, b] ). That the difference is you get wrt the "correct" parameter in the hist() function.

+14
source

The default value is right = TRUE, which gives the form intervals (a, b). Take an example to see what that means. Let's say that our data has a value of 5. Let us also say that the histogram is equal using breakpoints 3, 4, 5, 6. The question is what interval should have a value of 5. If we use right = TRUE, then the actual intervals which will be used will be (3, 4], (4, 5], (5, 6]. The interval notation (4, 5] means that it includes all values ​​from 4 to 5 - it does not include the actual value is 4, but it includes the value 5. Thus, our data point 5 falls into this interval.

If instead we used right = FALSE, then the intervals would have the form [a, b), so with the same breakpoints 3, 4, 5, 6 we would have the intervals [3, 4), [4, 5], [5, 6]. This time, our data point goes into the interval [5, 6], because this interval contains 5, while [4, 5] does not contain 5.

Essentially, the β€œright” parameter tells R what to do when the data point falls exactly where the breakpoint is located.

+2
source

R uses half-open intervals for histograms. This option determines which of the left or right endpoints is included in each half-open interval.

+1
source

Source: https://habr.com/ru/post/1388184/


All Articles