Two arguments ANOVA Errorbar in R

Question

Two arguments ANOVA Errorbar in R

We are studying the statistics class for biology students and are trying to use R as a platform for data processing and data visualization. As much as possible, we would like to avoid the use of additional packages and make something awful "bizarre" in R; The focus of the course is on statistics, not programming. However, we did not find a very good way to create an error graph in R for a two-factor ANOVA design. We use the ggplot2 package to create the graph, and although it has a built-in stat_summary method to generate 95% CI errors, the calculation method may not always be correct. Below I look through the code for ANOVA manually and calculate 95% CI manually (with a standard error estimated from the total residual variance, and not just for the ggplot group dispersion method). In the end, there really is a plot.

So the question is ... is there an easier / faster / easier way to do all this?

# LIZARD LENGTH DATA island.1 <- c(0.2, 5.9, 6.1, 6.5) island.2 <- c(5.6, 14.8, 15.5, 16.4) island.3 <- c(0.8, 3.9, 4.3, 4.9) sex.codes <- c("Male", "Female", "Male", "Female") # PUTTING DATA TOGETHER IN A DATA FRAME df.1 <- data.frame(island.1, island.2, island.3, sex.codes) # MELTING THE DATA FRAME INTO LONG FORM library(reshape) df.2 <- melt(df.1) # MEAN BY CELL mean.island1.male <- with(df.2, mean(value[variable == "island.1" & sex.codes == "Male"])) mean.island1.female <- with(df.2, mean(value[variable == "island.1" & sex.codes == "Female"])) mean.island2.male <- with(df.2, mean(value[variable == "island.2" & sex.codes == "Male"])) mean.island2.female <- with(df.2, mean(value[variable == "island.2" & sex.codes == "Female"])) mean.island3.male <- with(df.2, mean(value[variable == "island.3" & sex.codes == "Male"])) mean.island3.female <- with(df.2, mean(value[variable == "island.3" & sex.codes == "Female"])) # ADDING CELL MEANS TO DATA FRAME df.2$means[df.2$variable == "island.1" & df.2$sex.codes == "Male"] <- mean.island1.male df.2$means[df.2$variable == "island.1" & df.2$sex.codes == "Female"] <- mean.island1.female df.2$means[df.2$variable == "island.2" & df.2$sex.codes == "Male"] <- mean.island2.male df.2$means[df.2$variable == "island.2" & df.2$sex.codes == "Female"] <- mean.island2.female df.2$means[df.2$variable == "island.3" & df.2$sex.codes == "Male"] <- mean.island3.male df.2$means[df.2$variable == "island.3" & df.2$sex.codes == "Female"] <- mean.island3.female # LINEAR MODEL lizard.model <- lm(value ~ variable*sex.codes, data=df.2) # CALCULATING RESIDUALS BY HAND: df.2$residuals.1 <- df.2$value - df.2$means # CONFIRMING RESIDUALS FROM LINEAR MODEL: df.2$residuals.2 <- residuals(lizard.model) # TWO FACTOR MAIN EFFECT ANOVA lizard.anova <- anova(lizard.model) # INTERACTION PLOT interaction.plot(df.2$variable, df.2$sex.codes, df.2$value) # SAMPLE SIZE IN EACH CELL n <- length(df.2$value[df.2$variable == "island.1" & df.2$sex.codes == "Male"]) # > n # [1] 2 # NOTE: JUST FOR CLARITY, PRETEND n=10 n <- 10 # CALCULATING STANDARD ERROR island.se <- sqrt(lizard.anova$M[4]/n) # HALF CONFIDENCE INTERVAL island.ci.half <- qt(0.95, lizard.anova$D[4]) * island.se # MAKING SUMMARY DATA FRAME summary.df <- data.frame( Means = c(mean.island1.male, mean.island1.female, mean.island2.male, mean.island2.female, mean.island3.male, mean.island3.female), Location = c("island1", "island1", "island2", "island2", "island3", "island3"), Sex = c("male", "female", "male", "female", "male", "female"), CI.half = rep(island.ci.half, 6) ) # > summary.df # Means Location Sex CI.half # 1 3.15 island1 male 2.165215 # 2 6.20 island1 female 2.165215 # 3 10.55 island2 male 2.165215 # 4 15.60 island2 female 2.165215 # 5 2.55 island3 male 2.165215 # 6 4.40 island3 female 2.165215 # GENERATING THE ERRORBAR PLOT library(ggplot2) qplot(data=summary.df, y=Means, x=Location, group=Sex, ymin=Means-CI.half, ymax=Means+CI.half, geom=c("point", "errorbar", "line"), color=Sex, shape=Sex, width=0.25) + theme_bw()

ggplot2 errobar plot of Two-way Main Effects Anova

+4

r statistics ggplot2 confidence-interval anova

James waters Nov 02 '11 at 8:56

source share

3 answers

I have to admit I'm pretty puzzled by your code. Do not take this as personal criticism, but I highly recommend that you teach your students how to use the R power as much as possible. They can benefit from it, and my experience is that they quickly understand what happens if I don't throw lines and lines of code clutter in their heads.

First of all, you do not need to calculate funds manually. Just do:

 df.2$mean <- with(df.2,ave(value,sex.codes,variable,FUN=mean))

See also ?ave This is more understandable than the mess of code in your example. If you have lizard.model you can just use

 fitted(lizard.model)

and compare these values with the means.

Then I strongly disagree with you. What you calculate is not a standard error in your prediction. To do this correctly, use the predict() function

 outcome <- predict(lizard.model,se.fit=TRUE) df.2$CI.half <- outcome$se / 2

To get the confidence interval for predicted funds, you must use the correct formulas if you want your students to get it right. Take a look at section 3.5 of this incredibly large Practical Regression and Anova using R from Faraway. It contains many code examples, where everything is calculated manually in a convenient and concise way. He will serve you and your students. I learned a lot from him and often used it as a guide when explaining these things to students.

Now, to get a summary data file, you have several options, but this one works and is understandable.

 summary.df <- unique(df.2[,-c(3,5,6)]) names(summary.df) <- c('Sex','Location','Means','CI.half')

And now you can just run your plot code while it is standing.

Alternatively, if you want to get a prediction error for your values, you can use the following:

 lizard.predict <- predict(lizard.model,interval='prediction') df.2$lower <- lizard.predict[,2] df.2$upper <- lizard.predict[,3] summary.df <- unique(df.2[,-3]) names(summary.df)[1:3] <- c('Sex','Location','Means') qplot(data=summary.df, y=Means, x=Location, group=Sex, ymin=lower, ymax=upper, geom=c("point", "errorbar", "line"), color=Sex, shape=Sex, width=0.25) + theme_bw()

PS: If I feel harsh here and there, this is not intended. English is not my native language, and I'm still not familiar with the intricacies of the language.

+5

Joris meys Nov 02 '11 at 9:50

source share

[Potential shameless promotion] You should consider the compareCats and rxnNorm functions in the HandyStuff package, available at www.github.com/bryanhanson/HandyStuff Warning: I'm not sure that it works smoothly with R 2.14. In particular, rxnNorm looks like the plot you are trying to create, plus it gives you a lot of options in terms of summary statistics and plot decorations. But this requires that your students install a separate package, so you may exclude it (but this allows students to focus on presenting and analyzing data). The plot from the example given here? RxnNorm.

With rxnNorm, you have a choice of several ways to calculate CI controlled by the method argument. Here are the actual features (from the ChemoSpec package).

 > seX <- function (x) sd(x, na.rm = TRUE)/sqrt(length(na.omit(x))) > <environment: namespace:ChemoSpec> > > seXy <- function (x) { > m <- mean(na.omit(x)) > se <- seX(x) > u <- m + se > l <- m - se > c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec> > > > seXy95 <- function (x) { > m <- mean(na.omit(x)) > se <- seX(x) > u <- m + 1.96 * se > l <- m - 1.96 * se > c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec> > > > seXyIqr <- function (x) { > i <- fivenum(x) > c(y = i[3], ymin = i[2], ymax = i[4]) } <environment: namespace:ChemoSpec> > > seXyMad <- function (x) { > m <- median(na.omit(x)) > d <- mad(na.omit(x)) > u <- m + d > l <- m - d > c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec>

+4

Bryan hanson Nov 02 '11 at 10:40

source share

George dontas · Accepted Answer · 2011-11-02T13:11:58+0000

Here is another attempt to use the sciplot package. Alternative methods for calculating confidence intervals can be passed in the ci.fun parameter.

 lineplot.CI(variable,value, group =sex.codes , data = df.2, cex = 1.5, xlab = "Location", ylab = "means", cex.lab = 1.2, x.leg = 1, col = c("blue","red"), pch = c(16,16))

Two arguments ANOVA Errorbar in R

More articles: