R: Calculate a tool for a subset of a group

I want to calculate the average value for each "Day", but for a fraction of the day (Time = 12-14). This code works for me, but I have to enter every day as a new line of code, which will be hundreds of lines.

It seems that this should be easy to do. I made it easy when the grouping variables are the same, but don’t know how to do it when I don’t want to include all the values ​​throughout the day. Is there a better way to do this?

sapply(sap[sap$Day==165 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean) sapply(sap[sap$Day==166 & sap$Time %in% c(12,12.1,12.2,12.3,12.4,12.5,13,13.1,13.2,13.3,13.4,13.5, 14), ],mean) 

Here's what the data looks like:

 Day Time StomCond_Trunc 165 12 33.57189926 165 12.1 50.29437636 165 12.2 35.59876214 165 12.3 24.39879768 
+4
source share
3 answers

Try the following:

 aggregate(StomCond_Trunc~Day,data=subset(sap,Time>=12 & Time<=14),mean) 
+9
source

If you have a large data set, you can also look in the data.table package. Converting a data.frame to data.table pretty simple.

Example:

Large (ish) dataset

 df <- data.frame(Day=1:1000000,Time=sample(1:14,1000000,replace=T),StomCond_Trunc=rnorm(100000)*20) 

Using aggregate on data.frame

 >system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) user system elapsed 16.255 0.377 24.263 

Convert it to data.table

  dt <- data.table(df,key="Time") >system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) user system elapsed 9.534 0.178 15.270 

Update from Matthew . This time has improved significantly since it was originally called due to a new optimization function in data.table 1.8.2.

Repeating the difference between the two approaches using data.table 1.8.2 in R 2.15.1:

 df <- data.frame(Day=1:1000000, Time=sample(1:14,1000000,replace=T), StomCond_Trunc=rnorm(100000)*20) system.time(aggregate(StomCond_Trunc~Day,data=subset(df,Time>=12 & Time<=14),mean)) # user system elapsed # 10.19 0.27 10.47 dt <- data.table(df,key="Time") system.time(dt[Time>=12 & Time<=14,mean(StomCond_Trunc),by=Day]) # user system elapsed # 0.31 0.00 0.31 
+3
source

Using your original method, but with less input:

 sapply(sap[sap$Day==165 & sap$Time %in% seq(12, 14, 0.1), ],mean) 

However, this is only a slightly better method than your original. This is not as flexible as the other answers, since it depends on 0.1 increments of your time values. Other methods do not care about the size of the increment, which makes them more universal. I would recommend @Maiasaura to answer data.table

0
source

Source: https://habr.com/ru/post/1397208/


All Articles