Dplyr has problems with type overriding with group_by ()

I have the following problem:

When using dplyr to change a numeric column after group_by (), it fails if the row contains only one value, which is NaN when using the mutate command.

Thus, if a grouped column contains a numeric value, it is correctly classified as dbl, but as soon as the instance only NaN for the group, it fails, because dplyr defines this group as lgl , and all other groups dbl .

My first (and more general question): Is there a way to tell dplyr when using group_by () to always define a column in a certain way?

Secondly, can someone help me crack the problem described below in MWE:

# ERROR: This will provide the column defining error mentioned:

df <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df <- df %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)

df <- df %>% mutate(Winsorise = ifelse(x>2,2,x))

# NO ERROR (as no groups have single entry with NaN):
df2 <- data_frame(a = c(rep(LETTERS[1:2],4),"C"),g = c(rep(LETTERS[5:7],3)), x = c(7, 8,3, 5, 9, 2, 4, 7,8)) %>% tbl_df()
df2 <- df2 %>% group_by(a) %>% mutate_each(funs(sd(., na.rm=TRUE)),x)

# Update the Group for the row with an NA - Works
df2[9,1] <- "A"
df2 <- df2 %>% mutate(Winsorise = ifelse(x>3,3,x))


# REASON FOR ERROR: What happens for groups with one member = NaN, although we want the winsorise column to be dbl not lgl: 
df3 <- data_frame(g = "A",x = NaN)
df3 <- df3 %>% mutate(Winsorise = ifelse(x>3,3,x))
+4
source share
1 answer

The reason is that, as you correctly pointed out in df3, the result of the mutant is given as logical when the source column is NaN / NA.

To get around this, type your answer as a number:

df <- df %>% mutate(Winsorise = as.numeric(ifelse(x>2,2,x)))

Perhaps @hadley could shed some light on why the mutant result is different as lgl?

+3
source

Source: https://habr.com/ru/post/1626156/


All Articles