ANOVA: degrees of freedom almost all equal 1

I have a dataset that starts like this:

> d.weight RNPC D.weight 1 1 0 0 GO 45.3 2 2 0 0 GO 34.0 3 3 0 0 GO 19.1 4 4 0 0 GO 26.6 5 5 0 0 GO 23.5 6 1 45 0 GO 22.1 7 2 45 0 GO 15.5 8 3 45 0 GO 23.4 9 4 45 0 GO 15.8 10 5 45 0 GO 42.9 ... 

etc.

  • R is a copy, and there are 5 of them (1-5).
  • N is the nitrogen level, as well as 5 (0, 45, 90, 180, 360).
  • P is the phosphorus level, as well as 5 (0, 35, 70, 140, 280).
  • C is a combination of plants, and there are 4 of them (GO, GB, LO, LB).
  • D.weight - dry weight in grams.

However, when I do ANOVA, I get the wrong degrees of freedom. I usually run my ANOVA on subsets of this complete dataset, but let's just do some analysis that I really would not do, just so you can see that almost all Df (degrees of freedom) are incorrect.

 > example.aov=aov(D.weight ~ R+N+P+C, data=d.weight) > summary(example.aov) Df Sum Sq Mean Sq F value Pr(>F) R 1 1158 1158 9.484 0.00226 ** N 1 202 202 1.657 0.19900 P 1 11040 11040 90.408 < 2e-16 *** C 3 41032 13677 112.010 < 2e-16 *** Residuals 313 38220 122 

So, in principle, the only one who is right is factor C. Is it because it has letters instead of numbers?

I somewhere found that if I write interaction() with each term, I get the correct Df, but I don’t know if it is right to do it in general. For instance:

 > example.aov2=aov(D.weight ~ interaction(R)+interaction(N)+interaction(P)+interaction(C), data=d.weight) > summary(example.aov2) Df Sum Sq Mean Sq F value Pr(>F) interaction(R) 4 7423 1856 19.544 2.51e-14 *** interaction(N) 4 543 136 1.429 0.224 interaction(P) 4 13788 3447 36.301 < 2e-16 *** interaction(C) 3 41032 13677 144.042 < 2e-16 *** Residuals 304 28866 95 

I tried this with factor C , just to see if it messed up anything:

 > example.aov3=aov(D.weight ~ C, data=d.weight) > summary(example.aov3) Df Sum Sq Mean Sq F value Pr(>F) C 3 41032 13677 85.38 <2e-16 *** Residuals 316 50620 160 > > example.aov4=aov(D.weight ~ interaction(C), data=d.weight) > summary(example.aov4) Df Sum Sq Mean Sq F value Pr(>F) interaction(C) 3 41032 13677 85.38 <2e-16 *** Residuals 316 50620 160 

And it looks the same. Should I add interaction() everywhere?

+6
source share
1 answer

R determines whether variables should be considered categorical (ANOVA type analysis) or continuous (regression type analysis), checking whether they are numeric or factor variables. Simply put, you can convert your independent variables into factors with

 facs <- c("R","N","P") d.weight[facs] <- lapply(d.weight[facs],factor) 

If you want to create helper variables instead of overwriting, you can do something like

 for (varname in facs) { d.weight[[paste0("f",varname)]] <- factor(d.weight[[varname]]) } 

There may be a more compact way to do this, but it should serve ...

+5
source

Source: https://habr.com/ru/post/1204577/


All Articles