Reorder NA position when using group_by

Question

Reorder NA position when using group_by

I would like to reorder the position of NA in a column, within each level of another categorical variable. For example, with this data frame:

df <- data.frame(fact=c(1,1,1,2,2,2), id=rep(1:6), value=c(NA,44,23,NA,NA,76))

I would like to change a new column, for example:

df$newvar <= c(44,23,NA,76,NA,NA)

I would think the following would work, but it is not:

dfb <- df %>% group_by(fact) %>% mutate(newvar = df$value[order(is.na(df$value))])

Any ideas on how I can do this?

+4

r dataframe dplyr na tibble

steve zissou Jan 31 '18 at 14:28

source share

4 answers

You do not even need to use dplyr, you can do it with base R:

df$newvar <- ave(df$value, df$fact, FUN = function(x) x[order(-x)])

df
#  fact id value newvar
#1    1  1    NA     44
#2    1  2    44     23
#3    1  3    23     NA
#4    2  4    NA     76
#5    2  5    NA     NA
#6    2  6    76     NA

+3

Mike H. Jan 31 '18 at 14:36

source share

- lead() NA . .

library(dplyr)

df %>% 
 group_by(fact) %>% 
 mutate(new = lead(value, sum(is.na(value))))

# A tibble: 6 x 4
# Groups:   fact [2]
   fact    id value   new
  <dbl> <int> <dbl> <dbl>
1  1.00     1  NA    44.0
2  1.00     2  44.0  23.0
3  1.00     3  23.0  NA  
4  2.00     4  NA    76.0
5  2.00     5  NA    NA  
6  2.00     6  76.0  NA

NOTE: This will only work if your NA are at the top and you need them at the bottom.

+3

Sotos Jan 31 '18 at 14:36

source share

Another suggestion, using arrangeto match the dplyr verbs:

df %>%
  mutate(newvar = 
    arrange(df, fact, is.na(value), id) %>% pull(value)
  )

+1

David Klotz Jan 31 '18 at 15:06

source share

Florian · Accepted Answer · 2018-01-31T14:32:13+0000

You must delete the part df$in the mutate statement, otherwise you will refer to the full column instead of the column to the group. So this should work fine:

df %>% group_by(fact) %>% mutate(newvar = value[order(is.na(value))])

Conclusion:

# A tibble: 6 x 4
# Groups: fact [2]
   fact    id value newvar
  <dbl> <int> <dbl>  <dbl>
1  1.00     1  NA     44.0
2  1.00     2  44.0   23.0
3  1.00     3  23.0   NA  
4  2.00     4  NA     76.0
5  2.00     5  NA     NA  
6  2.00     6  76.0   NA

Reorder NA position when using group_by

More articles: