Select a value based on the highest value in another column

I do not understand why I cannot find a solution for this, since I feel that this is a fairly simple question. Need to ask for help. I want to reorder the data set by month with the maximum temperature for each month. In addition, I want to find the appropriate day for each monthly maximum temperature. What is the laziest (code) way to do this?

I tried to follow without success:

require(reshape2) names(airquality) <- tolower(names(airquality)) mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp")) dcast(mm, month + day ~ variable, max) aggregate(formula = temp ~ month + day, data = airquality, FUN = max) 

I'm for something like this:

 month day temp 5 7 89 ... 
+6
source share
4 answers

Some time ago, it was discussed whether this is lazy or not. In any case, it is short and natural to write and read (and quickly for big data, so you do not need to change or optimize it later):

 require(data.table) DT=as.data.table(airquality) DT[,.SD[which.max(Temp)],by=Month] Month Ozone Solar.R Wind Temp Day [1,] 5 45 252 14.9 81 29 [2,] 6 NA 259 10.9 93 11 [3,] 7 97 267 6.3 92 8 [4,] 8 76 203 9.7 97 28 [5,] 9 73 183 2.8 93 3 

.SD is a subset of the data for each group, and you just want the line to be with it with the highest Temp, iiuc. If you need a line number, then it can be added.

Or get all the lines in which max is bound:

 DT[,.SD[Temp==max(Temp)],by=Month] Month Ozone Solar.R Wind Temp Day [1,] 5 45 252 14.9 81 29 [2,] 6 NA 259 10.9 93 11 [3,] 7 97 267 6.3 92 8 [4,] 7 97 272 5.7 92 9 [5,] 8 76 203 9.7 97 28 [6,] 9 73 183 2.8 93 3 [7,] 9 91 189 4.6 93 4 
+5
source

Another approach with plyr

 require(reshape2) names(airquality) <- tolower(names(airquality)) mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp') library(plyr) ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable) 

gives

  month day temp 1 5 29 81 2 6 11 93 3 7 8 92 4 7 9 92 5 8 28 97 6 9 3 93 7 9 4 93 

Or, even easier

 require(reshape2) require(plyr) names(airquality) <- tolower(names(airquality)) ddply(airquality, .(month), subset, subset = temp == max(temp), select = c(month, day, temp) ) 
+3
source

how with plyr ?

 max.func <- function(df) { max.temp <- max(df$temp) return(data.frame(day = df$Day[df$Temp==max.temp], temp = max.temp)) } ddply(airquality, .(Month), max.func) 

As you can see, the maximum temperature of a month is more than one day. If you want a different behavior, this function is quite easy to configure.

+2
source

Or if you want to use the data.table package (for example, if speed is a problem and the data set is large or if you prefer the syntax):

 library(data.table) DT <- data.table(airquality) DT[, list(maxTemp=max(Temp), dayMaxTemp=.SD[max(Temp)==Temp, Day]), by="Month"] 

If you want to know what .SD means, look here: fooobar.com/questions/50491 / ...

+2
source

Source: https://habr.com/ru/post/916364/


All Articles