Weighted funds by group and column

Question

Weighted funds by group and column

I want to get weighted funds by groups for each of several (actually about 60) columns. This question is very similar to: repeatedly applying ave to compute group facilities in a data frame that just asked.

I came up with two ways to get weighted funds:

use separate sapply for each column
place the sapply inside the for-loop

However, I believe that there should be a way to insert the apply statement inside the sapply or vice versa, thereby eliminating the for-loop . I tried many permutations without success. I also looked at the sweep function.

Here is the code I have.

 df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df) # obtain weighted averages for y1980, y1990 and y2000 # one column at a time using one sapply per column sapply(split(df, df$group), function(x) weighted.mean(x$y1980, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y1990, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y2000, w = x$weights)) # obtain weighted average for y1980, y1990 and y2000 # one column at a time using a for-loop y <- matrix(NA, nrow=7, ncol=3) group.b <- df[!duplicated(df$group), 1] for(i in 6:8) { y[,(i-5)] <- sapply(split(df[,c(1:5,i)], df$group), function(x) weighted.mean(x[,6], w = x$weights)) } # add weighted averages to the original data set y2 <- data.frame(group.b, y) colnames(y2) <- c('group','ave1980','ave1990','ave2000') y2 y3 <- merge(df, y2, by=c('group'), all = TRUE) y3

Sorry for all my questions lately and thank you for any advice.

EDITED shows y3

  group region state county weights y1980 y1990 y2000 ave1980 ave1990 ave2000 1 1_1_1 1 1 1 10 100 200 50 100.0000 200.0000 50.0000 2 1_1_2 1 1 2 5 50 100 200 50.0000 100.0000 200.0000 3 1_1_3 1 1 3 120 1000 500 250 1000.0000 500.0000 250.0000 4 1_1_4 1 1 4 2 25 100 400 113.2353 144.1176 223.5294 5 1_1_4 1 1 4 15 125 150 200 113.2353 144.1176 223.5294 6 2_2_1 2 2 1 1 10 50 150 10.0000 50.0000 150.0000 7 2_2_2 2 2 2 10 10 10 200 34.0000 82.0000 64.0000 8 2_2_2 2 2 2 40 40 100 30 34.0000 82.0000 64.0000 9 2_2_3 2 2 3 20 100 100 10 100.0000 100.0000 10.0000

+4

for-loop r apply sapply

Mark miller Jan 03 '13 at 19:33

source share

2 answers

Rolling · Answer 1 · 2013-01-03T21:17:51+0000

I suggest using package data.table:

 library(data.table) dt <- as.data.table(df) dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)] print(dt2) region state county weights y1980 y1990 y2000 1: 1 1 1 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 13.47059 113.2353 144.1176 223.5294 5: 2 2 1 1.00000 10.0000 50.0000 150.0000 6: 2 2 2 34.00000 34.0000 82.0000 64.0000 7: 2 2 3 20.00000 100.0000 100.0000 10.0000

If you want, you can merge with the source data. table subsequently:

 merge(dt,dt2,by=c("region","state","county")) region state county weights.x y1980.x y1990.x y2000.x weights.y y1980.y y1990.y y2000.y 1: 1 1 1 10 100 200 50 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5 50 100 200 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120 1000 500 250 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 2 25 100 400 13.47059 113.2353 144.1176 223.5294 5: 1 1 4 15 125 150 200 13.47059 113.2353 144.1176 223.5294 6: 2 2 1 1 10 50 150 1.00000 10.0000 50.0000 150.0000 7: 2 2 2 10 10 10 200 34.00000 34.0000 82.0000 64.0000 8: 2 2 2 40 40 100 30 34.00000 34.0000 82.0000 64.0000 9: 2 2 3 20 100 100 10 20.00000 100.0000 100.0000 10.0000

Mark miller · Answer 2 · 2013-01-04T17:14:59+0000

I figured out how to sapply inside apply to get weighted group and column averages without using an explicit for-loop . Below I provide a dataset, an apply statement, and an explanation of how the apply statement works.

Here is the dataset from the original message:

 df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df)

Here is the apply / sapply code to get the desired weighted funds.

 apply(df[,6:ncol(df)], 2, function(x) {sapply(split(data.frame(df[,1:5], x), df$group), function(y) weighted.mean(y[,6], w = y$weights))})

Here is an explanation of the apply / sapply statement above:

Note that the apply statement selects columns 6 through 8 of df one at a time.
For each of these three columns, I create a new data frame combining this separate column with the first five df columns.
I then broke each of these new six-column frames into pieces using the df$group grouping variable.
Once the data frame of six columns has been split into separate pieces, I calculate the weighted average for the last column (6th column) of each fragment.

Here is the result:

  y1980 y1990 y2000 1_1_1 100.0000 200.0000 50.0000 1_1_2 50.0000 100.0000 200.0000 1_1_3 1000.0000 500.0000 250.0000 1_1_4 113.2353 144.1176 223.5294 2_2_1 10.0000 50.0000 150.0000 2_2_2 34.0000 82.0000 64.0000 2_2_3 100.0000 100.0000 10.0000

Using the data.table package is nice, but until I know its syntax and how this syntax differs from the data.frame syntax, I thought it would be nice to know how to use apply and sapply to do the same. Now I can use both approaches, as well as approaches in the original post, to test them against others and learn more about all of them.

Weighted funds by group and column

More articles: