Weighted funds by group and column

I want to get weighted funds by groups for each of several (actually about 60) columns. This question is very similar to: repeatedly applying ave to compute group facilities in a data frame that just asked.

I came up with two ways to get weighted funds:

  • use separate sapply for each column
  • place the sapply inside the for-loop

However, I believe that there should be a way to insert the apply statement inside the sapply or vice versa, thereby eliminating the for-loop . I tried many permutations without success. I also looked at the sweep function.

Here is the code I have.

 df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df) # obtain weighted averages for y1980, y1990 and y2000 # one column at a time using one sapply per column sapply(split(df, df$group), function(x) weighted.mean(x$y1980, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y1990, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y2000, w = x$weights)) # obtain weighted average for y1980, y1990 and y2000 # one column at a time using a for-loop y <- matrix(NA, nrow=7, ncol=3) group.b <- df[!duplicated(df$group), 1] for(i in 6:8) { y[,(i-5)] <- sapply(split(df[,c(1:5,i)], df$group), function(x) weighted.mean(x[,6], w = x$weights)) } # add weighted averages to the original data set y2 <- data.frame(group.b, y) colnames(y2) <- c('group','ave1980','ave1990','ave2000') y2 y3 <- merge(df, y2, by=c('group'), all = TRUE) y3 

Sorry for all my questions lately and thank you for any advice.

EDITED shows y3

  group region state county weights y1980 y1990 y2000 ave1980 ave1990 ave2000 1 1_1_1 1 1 1 10 100 200 50 100.0000 200.0000 50.0000 2 1_1_2 1 1 2 5 50 100 200 50.0000 100.0000 200.0000 3 1_1_3 1 1 3 120 1000 500 250 1000.0000 500.0000 250.0000 4 1_1_4 1 1 4 2 25 100 400 113.2353 144.1176 223.5294 5 1_1_4 1 1 4 15 125 150 200 113.2353 144.1176 223.5294 6 2_2_1 2 2 1 1 10 50 150 10.0000 50.0000 150.0000 7 2_2_2 2 2 2 10 10 10 200 34.0000 82.0000 64.0000 8 2_2_2 2 2 2 40 40 100 30 34.0000 82.0000 64.0000 9 2_2_3 2 2 3 20 100 100 10 100.0000 100.0000 10.0000 
+4
source share
2 answers

I suggest using package data.table:

 library(data.table) dt <- as.data.table(df) dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)] print(dt2) region state county weights y1980 y1990 y2000 1: 1 1 1 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 13.47059 113.2353 144.1176 223.5294 5: 2 2 1 1.00000 10.0000 50.0000 150.0000 6: 2 2 2 34.00000 34.0000 82.0000 64.0000 7: 2 2 3 20.00000 100.0000 100.0000 10.0000 

If you want, you can merge with the source data. table subsequently:

 merge(dt,dt2,by=c("region","state","county")) region state county weights.x y1980.x y1990.x y2000.x weights.y y1980.y y1990.y y2000.y 1: 1 1 1 10 100 200 50 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5 50 100 200 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120 1000 500 250 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 2 25 100 400 13.47059 113.2353 144.1176 223.5294 5: 1 1 4 15 125 150 200 13.47059 113.2353 144.1176 223.5294 6: 2 2 1 1 10 50 150 1.00000 10.0000 50.0000 150.0000 7: 2 2 2 10 10 10 200 34.00000 34.0000 82.0000 64.0000 8: 2 2 2 40 40 100 30 34.00000 34.0000 82.0000 64.0000 9: 2 2 3 20 100 100 10 20.00000 100.0000 100.0000 10.0000 
+5
source

I figured out how to sapply inside apply to get weighted group and column averages without using an explicit for-loop . Below I provide a dataset, an apply statement, and an explanation of how the apply statement works.

Here is the dataset from the original message:

 df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df) 

Here is the apply / sapply code to get the desired weighted funds.

 apply(df[,6:ncol(df)], 2, function(x) {sapply(split(data.frame(df[,1:5], x), df$group), function(y) weighted.mean(y[,6], w = y$weights))}) 

Here is an explanation of the apply / sapply statement above:

  • Note that the apply statement selects columns 6 through 8 of df one at a time.

  • For each of these three columns, I create a new data frame combining this separate column with the first five df columns.

  • I then broke each of these new six-column frames into pieces using the df$group grouping variable.

  • Once the data frame of six columns has been split into separate pieces, I calculate the weighted average for the last column (6th column) of each fragment.

Here is the result:

  y1980 y1990 y2000 1_1_1 100.0000 200.0000 50.0000 1_1_2 50.0000 100.0000 200.0000 1_1_3 1000.0000 500.0000 250.0000 1_1_4 113.2353 144.1176 223.5294 2_2_1 10.0000 50.0000 150.0000 2_2_2 34.0000 82.0000 64.0000 2_2_3 100.0000 100.0000 10.0000 

Using the data.table package is nice, but until I know its syntax and how this syntax differs from the data.frame syntax, I thought it would be nice to know how to use apply and sapply to do the same. Now I can use both approaches, as well as approaches in the original post, to test them against others and learn more about all of them.

+1
source

Source: https://habr.com/ru/post/975787/


All Articles