Emission labeling with ggplot

I am trying to label outliers ggplot. As for my code, I have two questions:

  • Why doesn't it highlight shortcuts below 1.5 * IQR?

  • Why doesn't he highlight shortcuts based on the group in which they are located, but seems to refer to a shared data medium? I would like to tag outliers for each individual window. That is, emissions for country A in wave 1 (survey), etc.

Sample of my code:

PERCENT <- rnorm(50, sd = 3)
WAVE <- sample(6, 50, replace = TRUE)
AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
COUNTRY <- rep(c("Country A", "Country B"), 25)
N <- rnorm(50, mean = 200, sd = 2)

df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)

ggplot(df, aes(x = factor(WAVE), y = PERCENT, fill = factor(COUNTRY))) +
  geom_boxplot(alpha = 0.3) +
  geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) +
  geom_text(aes(label = ifelse(PERCENT > 1.5*IQR(PERCENT)|PERCENT < -1.5*IQR(PERCENT), paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), hjust = -.3, size = 3)

Image of what I still have: Outlier label

enter image description here

I appreciate your help!

+4
source share
2 answers

If you want to IQRbe calculated by country, you need to group the data. Perhaps you could do this globally (i.e. before sending data to ggplot) or locally in this layer.

library(dplyr)
library(ggplot2)

ggplot(df, aes(x = as.factor(WAVE), y = PERCENT, fill = COUNTRY)) +
  geom_boxplot(alpha = 0.3) +
  geom_point(aes(color = AGE_GROUP, group = COUNTRY), position = position_dodge(width=0.75)) +
  geom_text(aes(group = COUNTRY, label = ifelse(!between(PERCENT,-1.3*IQR(PERCENT), 1.3*IQR(PERCENT)), 
                                                paste(" ",COUNTRY, ",", AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), 
            position = position_dodge(width=0.75),
            hjust = "left", size = 3)
+2

group geom_text ifelse , .

group = interaction(WAVE, COUNTRY) , outliner median(PERCENT).

library(ggplot2)
set.seed(42)

PERCENT   <- rnorm(50, sd = 3)
WAVE      <- sample(6, 50, replace = TRUE)
AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
COUNTRY   <- rep(c("Country A", "Country B"), 25)
N         <- rnorm(50, mean = 200, sd = 2)

df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)

ggplot(df) +
  aes(x = factor(WAVE),
      y = PERCENT,
      fill = factor(COUNTRY)) +
  geom_boxplot(alpha = 0.3) +
  geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) + 

  geom_text(aes(group = interaction(WAVE, COUNTRY),
                label = ifelse(test = PERCENT > median(PERCENT) + 1.5*IQR(PERCENT)|PERCENT < median(PERCENT) -1.5*IQR(PERCENT),
                               yes  = paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),
                               no   = '')),
            position = position_dodge(width = 0.75),
            hjust = -.2,
            size = 3)

enter image description here

+1

Source: https://habr.com/ru/post/1690884/


All Articles