Return columns containing maximum values for all variables in the data frame

Question

Return columns containing maximum values for all variables in the data frame

zone_id=1:6 v1=c(12,22,31,12,5,17) v2=c(15,22,28,16,18,21) v3=c(18,10,14,9,10,17) v4=c(20,3,2,5,12,21) mydata=data.frame(zone_id,v1,v2,v3,v4)

I have a dataframe whose rough model can be executed using the above code. It consists of geographic data series. I have variables (4 in this example, but 69 in my actual dataset) that contain integers that are observations in these areas. For each zone_id, I want to identify a variable from V1 to V4 that contains the maximum value. Where there is a relationship, I want to return the names of each of the variables that contain the associated maximum value. Therefore, for zone 1 I want to return V4, for zone 2 I want to return V1 and V2, etc.

I am very new to R and could not get the first base with this. I researched the R help files and thought there might be a solution using sweep? Any help appreciated.

+4

r dataframe

gavinr Jan 20 '12 at 13:02

source share

2 answers

Gavin simpson · Answer 1 · 2012-01-20T13:26:44+0000

Here you can use the idiom which(x == max(x)) and use apply() to run it on each line:

 apply(mydata[, -1], 1, function(x) which(x == max(x)))

which gives:

 > apply(mydata[, -1], 1, function(x) which(x == max(x))) [[1]] v4 4 [[2]] v1 v2 1 2 [[3]] v1 1 [[4]] v2 2 [[5]] v2 2 [[6]] v2 v4 2 4

The list contains the index vectors of the variable (s) that are maximum, and the names of these vectors can be extracted using names() to indicate the actual variable ID:

 > out <- apply(mydata[, -1], 1, function(x) which(x == max(x))) > names(out[[2]]) [1] "v1" "v2" > lapply(out, names) [[1]] [1] "v4" [[2]] [1] "v1" "v2" [[3]] [1] "v1" [[4]] [1] "v2" [[5]] [1] "v2" [[6]] [1] "v2" "v4"

If your data may contain NA , then we need to be a little smarter, for example.

 apply(mydata[, -1], 1, function(x, na.rm = FALSE) which(x == max(x, na.rm = na.rm)), na.rm = TRUE)

in which we can proceed to ignore NA or not.

James · Answer 2 · 2012-01-20T13:18:52+0000

One method is to use rank . Pay attention to the minus sign in front of the data, since the default order is low → high

 x <- apply(-mydata[,-1],1,rank,ties.method="min") x [,1] [,2] [,3] [,4] [,5] [,6] v1 4 1 1 2 4 3 v2 3 1 2 1 1 1 v3 2 3 3 3 3 3 v4 1 4 4 4 2 1

And you can extract the names using sapply :

 sapply(mydata$zone_id,function(y) rownames(x)[x[,y]==1]) [[1]] [1] "v4" [[2]] [1] "v1" "v2" [[3]] [1] "v1" [[4]] [1] "v2" [[5]] [1] "v2" [[6]] [1] "v2" "v4"

Return columns containing maximum values ​​for all variables in the data frame

More articles:

Return columns containing maximum values for all variables in the data frame