How to calculate the weighted means of the vector within the levels of factors?

Question

How to calculate the weighted means of the vector within the levels of factors?

I can successfully get a simple average of a given vector within the levels of factors, but, trying to move on to the next stage of weighing observations, I cannot get it to work. It works:

> tapply(exp.f,part.f.p.d,mean)
    1         2         3         4         5         6         7        8             9        10 
0.8535996 1.1256058 0.6968142 1.4346451 0.8136110 1.2006801 1.6112160 1.9168835     1.5135006 3.0312460

But this is not so:

> tapply(exp.f,part.f.p.d,weighted.mean,b.pct)
Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length
>

In the code below, I am trying to find the weighted average value of exp.f within the levels of the part.fpd factor, weighted by the observations inside b.pct, which are at each level.

b.exp <- tapply(exp.f,part.f.p.d,weighted.mean,b.pct)

Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length

I think I should supply the wrong syntax, since all three of these vectors are the same length:

> length(b.pct)
[1] 978
> length(exp.f)
[1] 978
> length(part.f.p.d)
[1] 978

What is the right way to do this? Thank you in advance.

+3

r

user297400 Feb 01 '11 at 18:29

source share

3 answers

. , part.f.p.d - , .

b.pct <- sample(1:100, 10) / 100
exp.f <- sample(1:1000, 10)
part.f.p.d <- factor(rep(letters[1:5], 2))

tapply(exp.f, part.f.p.d, mean) # this works
tapply(exp.f, part.f.p.d, weighted.mean, w = b.pct) # this doesn't

traceback() . , , , INDEX (.. part.f.p.d), tapply(), X (.. exp.f) . weighted.mean() w (.. b.pct), .

EDIT: , .

sapply(levels(part.f.p.d), 
       function(whichpart) weighted.mean(x = exp.f[part.f.p.d == whichpart], 
                                         w = b.pct[part.f.p.d == whichpart]))

+2

J. Win. 01 . '11 18:40

Your problem is that it tapplydoes not "separate" the additional arguments provided (through its arguments ...), as well as for the main argument X. See "Note" on the help page for tapply( ?tapply).

Additional arguments for FUN, the arguments provided ... are not divided into cells. Therefore, it is inappropriate for FUN to expect additional arguments with the same length as X.

Here is the hacker solution.

exp.f <- rnorm(10)
part.f.p.d <- factor(sample(1:5, size = 10, replace = T))
b.pct <- rnorm(10)
a <- split(exp.f, part.f.p.d)
b <- split(b.pct, part.f.p.d)
lapply(seq_along(a), function(i){
  weighted.mean(a[[i]], b[[i]])
})

+2

rbtgde Feb 01 '11 at 18:47

source share

Joshua Ulrich · Accepted Answer · 2011-02-01T18:53:08+0000

Now I do it like this (thanks to Gavin):

sapply(split(Data,Data$part.f.p.d), function(x) weighted.mean(x$exp.f,x$b.pct)))

, , ddply plyr:

ddply(Data, "part.f.p.d", function(x) weighted.mean(x$exp.f, x$b.pct))

How to calculate the weighted means of the vector within the levels of factors?

More articles: