Strange behavior in a dplyr slice for R

Question

Strange behavior in a dplyr slice for R

When calling slice(df, i) in the dplyr package for R, if the row index I am querying for does not exist ( nrows < i ), it returns all the rows except the first group, as I called slice(df, -1) .

For instance:

 library(dplyr) c1 <- c("a","b","c") c2 <- 1:3 df <- data.frame(c1,c2) slice(df,2)

The result will be as expected:

b 2

But if I call

 slice(df, 5)

the result is each line, but the first line:

 b 2 c 3

This is especially unpleasant when using group_by() and THEN calling slice() on groups. Is there a logical reason why slice() does this?

It seems that the returned rows filled with NA for row indices larger than "nrows" in groups not "high enough" to create the requested fragment may be a useful result.

This happened when I tried to extract a ranked result from each group, but some groups lacked data and others did not have enough. for example, "List the top 10 best selling salespeople from each region." But in one of the regions there are only 8 sellers.

+6

r dplyr

huff May 27 '15 at 19:42

source share

2 answers

hackR · Answer 1 · 2016-02-26T16:28:24+0000

I'm a little late for this party, but here. There is a very simple solution for the error message "Error: incompatible types awaiting a character vector"

just insert ungroup() in front of your mutate() function and you should be fine.

But I think this is some kind of error in slice() . I will send an error report.

Matthew plourde · Answer 2 · 2015-05-27T20:25:32+0000

I agree: This behavior seems to be wrong. You can use the following as an alternative:

 df <- data_frame(c1=c('a', 'a', 'b', 'c'), c2=c(1,2,3,4)) # c1 c2 # 1 a 1 # 2 a 2 # 3 b 3 # 4 c 4 # get the second smallest row for each group, or the last row for # groups with less than 2 elements df %>% group_by(c1) %>% filter(row_number() == min(2, n())) # c1 c2 # 1 a 2 # 2 b 3 # 3 c 4

Strange behavior in a dplyr slice for R

More articles: