Cyclic cycle; multiple factors

I want to use ddply to summarize data from several variables using several factors.

I have the following test data:

site block plot rep name weight height dtf Alberta 1 2 1 A 43 139 54 Alberta 2 5 2 A 46 139 46 Alberta 4 10 3 A 49 136 54 Nunavut 1 1 1 A 49 136 59 Nunavut 2 4 2 A 51 135 50 Nunavut 3 8 3 A 52 133 56 Alberta 5 13 1 B 55 132 50 Alberta 4 12 2 B 55 125 46 Alberta 5 15 3 B 56 120 46 Nunavut 5 14 1 B 57 119 54 Nunavut 5 13 2 B 58 119 55 Nunavut 4 11 3 B 59 118 51 ... 

etc.

I want to take the variables "weight", "height", "dtf" and summarize them according to the factors "site" and "name".

I started with column name vectors:

 data.factors <- NULL data.variables <- NULL for(n in 1:length(data)){if(is.factor(data[[n]])){ data.factors <- c(data.factors,colnames(data[n]))} else next} for(n in 1:length(data)){if(is.numeric(data[[n]]) || is.integer(data[[n]])){ data.variables <- c(data.variables,colnames(data[n]))} else next} 

This worked to execute several one-factor announcements:

 for(variables in data.variables){ for(factors in data.factors){ output1 <- aov(lm(data[[variables]]~data[[factors]])) cat(variables) cat(" by ") cat(factors) cat("\n") print(summary(output1)) }} 

But I can not get it to work with ddply.

 for (x in data.variables){ variable.summary <- ddply(data, .(site,name), summarise, N = sum(!is.na(x[1])), min = min(x[1], na.rm=TRUE), max = max(x[1], na.rm=TRUE), mean = mean(x[1], na.rm=TRUE), sd = sd(x[1], na.rm=TRUE), se = sd / sqrt(N) ) print(variable.summary) } 

All I get is the following:

 site name N min max mean sd se 1 Alberta A 1 weight weight NA NA NA 2 Alberta B 1 weight weight NA NA NA 3 Alberta C 1 weight weight NA NA NA 4 Alberta D 1 weight weight NA NA NA 5 Alberta E 1 weight weight NA NA NA 6 Nunavut A 1 weight weight NA NA NA 7 Nunavut B 1 weight weight NA NA NA 8 Nunavut C 1 weight weight NA NA NA 9 Nunavut D 1 weight weight NA NA NA 10 Nunavut E 1 weight weight NA NA NA .... 

If I tested ddply using a single variable (typed directly, not a reference to "x"), it would work fine.

Is there a trick to getting a function to recognize a column reference id? I got used to PERL, with its $ Scalars, which can be referenced anywhere, and hoped that a similar system would be available in R.

+5
source share
2 answers

Ddply's successor, dplyr, can do this very easily using group_by() and summarise_each() , without having to contact anything:

 df <- data.frame(site = c("Alberta", "Alberta", "Alberta", "Nunavut", "Nunavut", "Nunavut", "Alberta", "Alberta", "Alberta", "Nunavut", "Nunavut", "Nunavut"), block = c(1, 2, 4, 1, 2, 3, 5, 4, 5, 5, 5, 4), plot = c(2, 5, 10, 1, 4, 8, 13, 12, 15, 14, 13, 11), rep = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), name = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), weight = c(43, 46, 49, 49, 51, 52, 55, 55, 56, 57, 58, 59), height = c(139, 139, 136, 136, 135, 133, 132, 125, 120, 119, 119, 118), dtf = c(54, 46, 54, 59, 50, 56, 50, 46, 46, 54, 55, 51)) library(dplyr) df.summary <- df %>% group_by(site, name) %>% summarise_each(funs(sum, min, max, mean, sd), weight, height, dtf) 

The result is a data frame:

 > df.summary Source: local data frame [4 x 17] Groups: site site name weight_length height_length dtf_length weight_min height_min dtf_min 1 Alberta A 3 3 3 43 136 46 2 Alberta B 3 3 3 55 120 46 3 Nunavut A 3 3 3 49 133 50 4 Nunavut B 3 3 3 57 118 51 Variables not shown: weight_max (dbl), height_max (dbl), dtf_max (dbl), weight_mean (dbl), height_mean (dbl), dtf_mean (dbl), weight_sd (dbl), height_sd (dbl), dtf_sd (dbl) 

You can pass any function that you want to use funs() inside summarise_each , so if you need a column for standard errors, first execute the function:

 se <- function(x) { N <- sum(!is.na(x[1])) return(sd / sqrt(N)) } 

And skip it: summarise_each(funs(sum, min, max, mean, sd, se)...)

+3
source

Try using data.table:

 > testdt = data.table(testdf) > testdt[,list(meanwt=mean(weight),meanht=mean(height) ),by=list(site,name)] site name meanwt meanht 1: Alberta A 46.00000 138.0000 2: Nunavut A 50.66667 134.6667 3: Alberta B 55.33333 125.6667 4: Nunavut B 58.00000 118.6667 

Max, min, etc. can be added to the list of functions.

0
source

Source: https://habr.com/ru/post/1208774/


All Articles