Grouping variables selects the first row (hold one column), the last row (contain different columns)

Question

Grouping variables selects the first row (hold one column), the last row (contain different columns)

I have the following table:

id  origin destination price
 1     A      B          2
 1     C      D          2
 2     A      B          3
 3     B      E          6
 3     E      C          6
 3     C      F          6

Basically I want to group it with id, select the first item from originand save the last item from destination, resulting in this table.

id  origin destination price
 1     A      D          2
 2     A      B          3
 3     B      F          6

I know how to select the first and last row, but not do what I want.

df %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()

Is it possible to do this with dplyror even with data.table?

+4

r dataframe data.table dplyr

FilipeTeixeira May 23 '17 at 14:28

source share

2 answers

R split:

do.call(rbind, lapply(split(df, df$id), 
                      function(a) with(a, data.frame(origin=head(origin,1), destination=tail(destination,1), price=head(price,1)))))

#  origin destination price
#1      A           D     2
#2      A           B     3
#3      B           F     6

+1

989 23 '17 15:00

BigDataScientist · Accepted Answer · 2017-05-23T14:33:25+0000

Solution with library(data.table):

unique(setDT(df)[, "origin" := origin[1] , by = id][, "destination" := destination[.N], by = id][, "price" := price[1] , by = id][])

Label suggested by Imo:

setDT(df)[, .(origin=origin[1], destination=destination[.N], price=price[1]), by=id]

Grouping variables selects the first row (hold one column), the last row (contain different columns)

More articles: