I figured out how to sapply inside apply to get weighted group and column averages without using an explicit for-loop . Below I provide a dataset, an apply statement, and an explanation of how the apply statement works.
Here is the dataset from the original message:
df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df)
Here is the apply / sapply code to get the desired weighted funds.
apply(df[,6:ncol(df)], 2, function(x) {sapply(split(data.frame(df[,1:5], x), df$group), function(y) weighted.mean(y[,6], w = y$weights))})
Here is an explanation of the apply / sapply statement above:
Note that the apply statement selects columns 6 through 8 of df one at a time.
For each of these three columns, I create a new data frame combining this separate column with the first five df columns.
I then broke each of these new six-column frames into pieces using the df$group grouping variable.
Once the data frame of six columns has been split into separate pieces, I calculate the weighted average for the last column (6th column) of each fragment.
Here is the result:
y1980 y1990 y2000 1_1_1 100.0000 200.0000 50.0000 1_1_2 50.0000 100.0000 200.0000 1_1_3 1000.0000 500.0000 250.0000 1_1_4 113.2353 144.1176 223.5294 2_2_1 10.0000 50.0000 150.0000 2_2_2 34.0000 82.0000 64.0000 2_2_3 100.0000 100.0000 10.0000
Using the data.table package is nice, but until I know its syntax and how this syntax differs from the data.frame syntax, I thought it would be nice to know how to use apply and sapply to do the same. Now I can use both approaches, as well as approaches in the original post, to test them against others and learn more about all of them.