Sum with first value

When using, summariseI encountered unusual behavior.

df <- data.frame(id = c(1, 2, 3, 3, 4),
                 color = c(NA, "blue", "red", "blue", NA),
                 stringsAsFactors = FALSE)
df
#   id color
# 1  1  <NA>
# 2  2  blue
# 3  3   red
# 4  3  blue
# 5  4  <NA>

First part

Let’s choose the first value colorfor each id:

df %>% 
  group_by(id) %>% 
  summarise(result = color[1])
# # A tibble: 4 × 2
#      id result
#   <dbl>  <chr>
# 1     1       
# 2     2   blue
# 3     3    red
# 4     4   <NA>

I expected <NA>instead of an empty string. Did I do something wrong? first(color)produces the correct conclusion, but I thought it was color[1]equivalent.

In addition, it color %>% firstproduces the same conclusion as color[1], and this confuses me even more.

The second part of

Enter the following meaningless code:

df%>% 
  group_by(id) %>% 
  summarise(color = color[1],
            color2 = first(color))

I get segfault. Is this a known bug or should I report it? I found some old SO questions, and the GitHub threads look very similar, but they look resolved.

Note : I use dplyr 0.5.0inR 3.3.3

+4
1

: , dplyr 0.7.0

0

Source: https://habr.com/ru/post/1674866/


All Articles