I wrote a function (weighted.sd) that gives me some weighted statistics (e.g. mean, SD, standard error and 95% confidence interval). I want to apply this function for each level of the factor variable (regions), and then use the weighted statistics for each area in the ggplot2 graph with errors (therefore, a 95% confidence interval.
I also tried tapply and for-loop. But I did not understand. In addition, I like to use dplyr as much as I can, because it is easy to read and understand.
Here is my best attempt:
data<-as.data.frame(cbind(rnorm(1:50),as.factor(rnorm(1:50)),rnorm(1:50)))
colnames(data)<-c("index_var","factor_var","weight_var")
weighted.sd <- function(x,weight){
na <- is.na(x) | is.na(weight)
x <- x[!na]
weight <- weight[!na]
sum.w <- sum(weight)
sum.w2 <- sum(weight^2)
mean.w <- sum(x * weight) / sum(weight)
x.var.w<- (sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2)
x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
SE<- x.sd.w / sqrt(sum(weight))
error <- qnorm(0.975)*x.sd.w/sqrt(sum(weight))
left <- mean.w-error
right <- mean.w+error
return(cbind(mean.w,x.sd.w,SE,error,left,right))
}
test<- data %>%
group_by(factor_var) %>%
do(as.data.frame(weighted.sd(x=index_var,weight=weight_var)))
test
As a result, an error message appears (sorry, part of it is German, but you can reproduce it using the code):
Error in as.data.frame(weighted.sd(x = index_var, weight = weight_var)) :
Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl
für Funktion 'as.data.frame': Error in weighted.sd(x = index_var, weight = weight_var) :
object 'index_var' not found