Chi-square test with two samples in R

I am really new to R, so please bear with me. I use the chi-squared test to compare nucleotide frequencies at a given position, and I counted the number A, C, G, T in two different datasets:

x1 <- c(272003,310418,201601,237168)
x2 <- c(239614,316515,182070,198025)

I can think of two ways to ask for a test of two samples:

> chisq.test(x1,x2)

    Pearson Chi-squared test

data:  x1 and x2
X-squared = 12, df = 9, p-value = 0.2133

Warning message:
In chisq.test(x1, x2) : Chi-squared approximation may be incorrect

or

> chisq.test(cbind(x1,x2))

    Pearson Chi-squared test

data:  cbind(x1, x2)
X-squared = 2942.065, df = 3, p-value < 2.2e-16

I suspect the second version is correct, because I can also do this:

> chisq.test(x1,x1)

    Pearson Chi-squared test

data:  x1 and x1
X-squared = 12, df = 9, p-value = 0.2133

Warning message:
In chisq.test(x1, x1) : Chi-squared approximation may be incorrect

with an identical and clearly wrong result.

What is actually calculated in this case?

Thanks!

+4
source share
1 answer

chisq.test(x1,x1)$expected shows the following:

        x1
x1       201601 237168 272003 310418
  201601   0.25   0.25   0.25   0.25
  237168   0.25   0.25   0.25   0.25
  272003   0.25   0.25   0.25   0.25
  310418   0.25   0.25   0.25   0.25

Observed estimates ( chisq.test(x1,x1)$observed):

        x1
x1       201601 237168 272003 310418
  201601      1      0      0      0
  237168      0      1      0      0
  272003      0      0      1      0
  310418      0      0      0      1

, , , , . "" ( ). , chisq.test(cbind(x1,x1)) , (X-squared = 0, df = 3, p-value = 1).

:

> chisq.test(cbind(x1,x2))$observed
         x1     x2
[1,] 272003 239614
[2,] 310418 316515
[3,] 201601 182070
[4,] 237168 198025
> chisq.test(cbind(x1,x2))$expected
           x1       x2
[1,] 266912.4 244704.6
[2,] 327073.2 299859.8
[3,] 200162.6 183508.4
[4,] 227041.8 208151.2
+3

Source: https://habr.com/ru/post/1523848/


All Articles