How can I get the value for each n rows and keep the date index?

Question

How can I get the value for each n rows and keep the date index?

I have a dataframe with year index and val index.

I would like to create an average of every n rows val and keep the corresponding index of the year.

In principle, the output would be (for n = 2)

year val 1990 Mean(row1,row2) 1992 Mean(row3,row4) 1994 Mean(row5,row6) 1996 Mean(row7,row8)

How can i do this?

 structure(list(year = c(1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013), val = c(84L, 67L, 72L, 138L, 111L, 100L, 221L, 108L, 204L, 125L, 82L, 157L, 175L, 252L, 261L, 185L, 146L, 183L, 245L, 172L, 98L, 216L, 89L, 144L)), .Names = c("year", "val"), row.names = 13:36, class = "data.frame")

+5

r

maximusdooku Oct 27 '15 at 17:09

source share

7 answers

dplyr solution - add a grouping variable (1,1,2,2,3,3, etc.), then calculate the average value of val inside the groups and use the smallest year inside the groups, and then reset the grouping variable:

 > require(dplyr) > d %>% group_by(G=trunc(2:(n()+1)/2)) %>% summarise(mean=mean(val),year=min(year)) %>% select(-G) Source: local data frame [12 x 2] mean year 1 75.5 1990 2 105.0 1992 3 105.5 1994 4 164.5 1996 5 164.5 1998 6 119.5 2000 7 213.5 2002 8 223.0 2004 9 164.5 2006 10 208.5 2008 11 157.0 2010 12 116.5 2012

Generalized to a function for n and using a more accurate method to calculate the grouping variable:

 meanN = function(df, n){ df %>% group_by(G=(0:(n()-1))%/%n) %>% summarise(mean=mean(val),year=min(year)) %>% select(-G) } > meanN(d, 2) Source: local data table [12 x 2] mean year 1 75.5 1990 2 105.0 1992 3 105.5 1994 4 164.5 1996 5 164.5 1998 6 119.5 2000 7 213.5 2002 8 223.0 2004 9 164.5 2006 10 208.5 2008 11 157.0 2010 12 116.5 2012 > meanN(d, 12) Source: local data table [2 x 2] mean year 1 122.4167 1990 2 180.5000 2002

+6

Spacedman Oct 27 '15 at 17:16

source share

You can create a grouping variable with rep :

 n = 2 dd$group <- rep(1:(nrow(dd)/n), each = n)

Then you can use your library to perform group_wise operations. I used data.table.

 library(data.table) setDT(dd) #Getting the result is then trivial res <- dd[, .(year = min(year), mean_val = mean(val)), by = group]

+5

Heroka Oct 27 '15 at 17:18

source share

Using rollapply from zoo package

 > library(zoo) > res <- rollapply(df, width=2, by=2, FUN=mean) > res[,1] <- floor(res[,1]) > res year val [1,] 1990 75.5 [2,] 1992 105.0 [3,] 1994 105.5 [4,] 1996 164.5 [5,] 1998 164.5 [6,] 2000 119.5 [7,] 2002 213.5 [8,] 2004 223.0 [9,] 2006 164.5 [10,] 2008 208.5 [11,] 2010 157.0 [12,] 2012 116.5

as an alternative:

 rollapply(df, width=2, by=2, FUN=function(x) c(min(x), mean(x)))[, c(1,4)]

+5

Jilber urbina Oct 27 '15 at 17:31

source share

You can use aggregate by grouping rounded year values:

 setNames(aggregate(val~I(2*floor((year-min(year))/2)+min(year)), data=dat, mean), c("year", "val")) # year val # 1 1990 75.5 # 2 1992 105.0 # 3 1994 105.5 # 4 1996 164.5 # 5 1998 164.5 # 6 2000 119.5 # 7 2002 213.5 # 8 2004 223.0 # 9 2006 164.5 # 10 2008 208.5 # 11 2010 157.0 # 12 2012 116.5

+4

josliber Oct 27 '15 at 17:16

source share

You can use seq along with colMeans function

 data.frame(Year = df[seq(1, length(df$year), 2), ]$year, Mean = colMeans(matrix(df$val, nrow=2))) # Year Mean # 1 1990 75.5 # 2 1992 105.0 # 3 1994 105.5 # 4 1996 164.5 # 5 1998 164.5 # 6 2000 119.5 # 7 2002 213.5 # 8 2004 223.0 # 9 2006 164.5 # 10 2008 208.5 # 11 2010 157.0 # 12 2012 116.5

+4

Ronak shah Oct 27 '15 at 17:16

source share

try this single line layer:

 > t(sapply(split(dat,rep(seq(1,nrow(dat),2),each=2)),colMeans)) year val 1 1990.5 75.5 3 1992.5 105.0 5 1994.5 105.5 7 1996.5 164.5 9 1998.5 164.5 11 2000.5 119.5 13 2002.5 213.5 15 2004.5 223.0 17 2006.5 164.5 19 2008.5 208.5 21 2010.5 157.0 23 2012.5 116.5

Then you can round the year if necessary.

+2

fishtank Oct 27 '15 at 18:27

source share

Colonel beauvel · Accepted Answer · 2015-10-27T17:22:22+0000

Short one line solution with data.table :

 library(data.table) setDT(df)[,.(val=mean(val)), year-0:1] # year val # 1: 1990 75.5 # 2: 1992 105.0 # 3: 1994 105.5 # 4: 1996 164.5 # 5: 1998 164.5 # 6: 2000 119.5 # 7: 2002 213.5 # 8: 2004 223.0 # 9: 2006 164.5 #10: 2008 208.5 #11: 2010 157.0 #12: 2012 116.5

How can I get the value for each n rows and keep the date index?

More articles: