What is the difference between geom_point and geom_jitter in a simple language in R?

I was told to use geom_jitter over geom_points, and the reason given in the help is that it handles smaller data better. I'm confused about what overplotting means, and why does this happen in smaller datasets?

+4
source share
1 answer

Overplotting is when one or more points are in the same place (or close enough to the same place) that you cannot look at the graph and tell how many points are there.

Two (not mutually exclusive) cases that often lead to redoing:

  • - , x y , , .

  • - ( ), , x y .

Jittering . , . , - . .

, .

  • ( )

, ( ?geom_jitter):

p = ggplot(mpg, aes(cyl, hwy))
gridExtra::grid.arrange(
    p + geom_point(),
    p + geom_jitter(width = 0.25, height = 0.5)
)

enter image description here

, . , " ", , .

:

p2 = ggplot(diamonds, aes(carat, price))
gridExtra::grid.arrange(
    p2 + geom_point(),
    p2 + geom_jitter(),
    p2 + geom_point(alpha = 0.1, shape = 16)
)

, () , (). . ( ) .

enter image description here

+14

Source: https://habr.com/ru/post/1653201/


All Articles