Weighted funds for multiple columns by group (in the data table)

This question follows another in group weighting : I would like to create weighted intra-group averages using data.table . The difference with the initial question is that the names of the variables, which should be average, are indicated in the string vector.

Data:

 df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) 

Using Roland, suggested the answer from the above question:

 library(data.table) dt <- as.data.table(df) dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)] 

I have a row vector to define dynamic columns for which I want a weighted average value within a group.

 colsToKeep = c("y1980","y1990") 

But I do not know how to pass it as an argument to the magic of data.table.

I tried

  dt[,lapply( as.list(colsToKeep),weighted.mean,w=weights), by=list(region,state,county)]` 

but I get:

 Error in x * w : non-numeric argument to binary operator 

I don’t know how to achieve what I want.

Bonus question: I would like to keep the names of the original columns, instead of getting V1 and V2.

NB I am using version 1.9.3 of the data.table package.

+6
source share
2 answers

Usually you should be able to:

 dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), by = list(region,state,county), .SDcols = colsToKeep] 

ie just providing only those columns in .SDcols . But at the moment this will not work due to an error , in this column weights will not be available, because it is not listed in .SDcols .

Until it is fixed, we can do this as follows:

 dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), by = list(region, state, county)] # region state county y1980 y1990 # 1: 1 1 1 100.0000 200.0000 # 2: 1 1 2 50.0000 100.0000 # 3: 1 1 3 1000.0000 500.0000 # 4: 1 1 4 113.2353 144.1176 # 5: 2 2 1 10.0000 50.0000 # 6: 2 2 2 34.0000 82.0000 # 7: 2 2 3 100.0000 100.0000 
+8
source

I don't know data.table , but do you think you are using dplyr ? I think it is almost as fast as data.table

 library(dplyr) df %>% group_by(region, state, county) %>% summarise(mean_80 = weighted.mean(y1980, weights), mean_90 = weighted.mean(y1990, weights)) Source: local data frame [7 x 5] Groups: region, state region state county mean_80 mean_90 1 1 1 1 100.0000 200.0000 2 1 1 2 50.0000 100.0000 3 1 1 3 1000.0000 500.0000 4 1 1 4 113.2353 144.1176 5 2 2 1 10.0000 50.0000 6 2 2 2 34.0000 82.0000 7 2 2 3 100.0000 100.0000 
0
source

Source: https://habr.com/ru/post/975783/


All Articles