The totality of data in this column and the display of another column

Question

The totality of data in this column and the display of another column

I have a dataframe in R of the following form:

> head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f

I would like to combine it after the Score column using the max function

 > aggregate(data$Score, list(data$Group), max) Group.1 x 1 1 3 2 2 4

But I would also like to display the Info column associated with the maximum value of the Score column for each group. I do not know how to do that. My desired result:

  Group.1 xy 1 1 3 c 2 2 4 d

Any clues?

+45

r greatest-n-per-group aggregate plyr

jul635 Jun 09 2018-11-11T00:

source share

7 answers

R's basic solution is to combine the output of aggregate() with the step merge() . I find the aggregate() formula interface a little more useful than the standard interface, partly because the output names are nicer, so I will use this:

aggregate() step -

 maxs <- aggregate(Score ~ Group, data = dat, FUN = max)

and the merge() step is just

 merge(maxs, dat)

This gives us the desired result:

 R> maxs <- aggregate(Score ~ Group, data = dat, FUN = max) R> merge(maxs, dat) Group Score Info 1 1 3 c 2 2 4 d

You could, of course, insert this into a single line (the intermediate step was more for exposure):

 merge(aggregate(Score ~ Group, data = dat, FUN = max), dat)

The main reason I used the formula interface is because it returns a data frame with the correct names for the merge step; these are the column names from the original dat dataset. We need the correct names in the output of aggregate() so that merge() knows which columns in the source and aggregated data frames match.

The standard interface gives odd names, depending on what you call it:

 R> aggregate(dat$Score, list(dat$Group), max) Group.1 x 1 1 3 2 2 4 R> with(dat, aggregate(Score, list(Group), max)) Group.1 x 1 1 3 2 2 4

We can use merge() on these outputs, but we need to do more work by telling R which correspond to the columns.

+45

Gavin Simpson Jun 09 '11 at 8:16

source share

Here is a solution using the plyr package.

The next line of code essentially tells ddply to first group your data by Group, and then inside each group returns a subset in which the metric is the maximum value in that group.

 library(plyr) ddply(data, .(Group), function(x)x[x$Score==max(x$Score), ]) Group Score Info 1 1 3 c 2 2 4 d

And, as @SachaEpskamp points out, this can be further simplified:

 ddply(df, .(Group), function(x)x[which.max(x$Score), ])

(which also has the advantage that which.max will return some maximum rows, if any).

+12

Andrie Jun 09 2018-11-11T00:

source share

The plyr package can be used for this. Using the ddply() function, you can split the data frame into one or more columns and apply the function and return a data frame, and then using the summarize() function you can use the columns of the broken data frame as variables to make a new data frame /;

 dat <- read.table(textConnection('Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f')) library("plyr") ddply(dat,.(Group),summarize, Max = max(Score), Info = Info[which.max(Score)]) Group Max Info 1 1 3 c 2 2 4 d

+4

Sacha Epskamp Jun 09 2018-11-11T00:

source share

Late answer, but also using data.table

 library(data.table) DT <- data.table(dat) DT[, .SD[which.max(Score),], by = Group]

Or if it is possible to have more than one lowest score

 DT[, .SD[which(Score == max(Score)),], by = Group]

Noting that (from ?data.table

.SD is a data table containing a subset of x data for each group, except for the column (s) of the group

+4

mnel Oct 31 '12 at 22:55

source share

To add to Gavin's answers: before merging, you can get an aggregate for using proper names if you do not use the formula interface:

 aggregate(data[,"score", drop=F], list(group=data$group), mean)

+4

Dan Jan 28 '13 at 4:39

source share

This is how base I thought about the problem.

 my.df <- data.frame(group = rep(c(1,2), each = 3), score = runif(6), info = letters[1:6]) my.agg <- with(my.df, aggregate(score, list(group), max)) my.df.split <- with(my.df, split(x = my.df, f = group)) my.agg$info <- unlist(lapply(my.df.split, FUN = function(x) { x[which(x$score == max(x$score)), "info"] })) > my.agg Group.1 x info 1 1 0.9344336 a 2 2 0.7699763 e

+2

Roman Luštrik Jun 09 '11 at 8:17

source share

mbq · Accepted Answer · 2011-06-09 08:30

First, you split the data using split :

 split(z,z$Group)

Than for each fragment select the line with the maximum score:

 lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])

Finally, back to data.frame do.call ing rbind :

 do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))

Result:

  Group Score Info 1 1 3 c 2 2 4 d

One line, no magic spells, fast, the result has good names =)

The totality of data in this column and the display of another column

More articles: