Automation of a Chi-square by category and column

I have a review frame containing several questions (columns) encoded as 1 = agree / 0 = disagree. Respondents (lines) are classified according to the metrics of "age" ("young", "middle", "old"), "region" ("East", "Middle", "West"), etc. There are about 30 categories in total (3 ages, 3 regions, 2 genders, 11 classes, etc.). Within each metric, categories do not overlap and have different sizes.

This models the cut version of the dataset:

n<-400
set.seed(1)
data<-data.frame(age=sample(c('young','middle','old'),n,replace=T),region=sample(c('East','Mid','West'),n,replace=T),gender=sample(c('M','F'),n,replace=T),Q15a=sample(c(0,1),n,replace=T),Q15b=sample(c(0,1),n,replace=T))

I can use Chi-square to check if the answers, say, in the West, are significantly different from the general sample, for Q15a, with:

attach(data)
chisq.test(table(subset(data,region=='West')$Q15a),p=table(Q15a),rescale.p=T)

I want to check all categories against the general sample for Q15a, and then for ~ 20 other questions. Since about 30 tests were asked for each question, I want to find a way (efficiently or otherwise) to automate this, but I'm struggling to figure out how to get R to do it myself or how to write a loop to cycle through the categories. I searched [1] and was distracted by pairwise comparison testing using parwise.prop.test (), but so far I have not found anything that really answers it.

[1] similar but non-duplicate questions (both tests in columns):

Using Loops to Run a Chi-Square Test in R

Square Analysis using a for loop in R

+4
source share
2 answers

How about this?

# find all question columns containing Q, your "subset" may differ
nms <- names(data)
nms <- nms[grepl("Q", nms)]

result <- sapply(nms, FUN = function(x, data) {
  qinq <- data[, c("region", x)]
  by(data = qinq, INDICES = data$region, FUN = function(y, qinq) {
    chisq.test(table(y[, x]), p =  table(qinq[, x]), rescale.p = TRUE)
  }, qinq = qinq)
}, data = data, simplify = FALSE)

$Q15a
data$region: East

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.7494, df = 1, p-value = 0.3867

--------------------------------------------------------------------------------------------- 
data$region: Mid

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.2249, df = 1, p-value = 0.6353

--------------------------------------------------------------------------------------------- 
data$region: West

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 1.5877, df = 1, p-value = 0.2077


$Q15b
data$region: East

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.0697, df = 1, p-value = 0.7918

--------------------------------------------------------------------------------------------- 
data$region: Mid

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0, df = 1, p-value = 0.9987

--------------------------------------------------------------------------------------------- 
data$region: West

    Chi-squared test for given probabilities

data:  table(y[, x])
X-squared = 0.056, df = 1, p-value = 0.8129

, . p.value.

lapply(result, FUN = function(x) lapply(x, "[", "p.value"))

$Q15a
$Q15a$East
$Q15a$East$p.value
[1] 0.3866613


$Q15a$Mid
$Q15a$Mid$p.value
[1] 0.6353457


$Q15a$West
$Q15a$West$p.value
[1] 0.2076507



$Q15b
$Q15b$East
$Q15b$East$p.value
[1] 0.7918426


$Q15b$Mid
$Q15b$Mid$p.value
[1] 0.9986924


$Q15b$West
$Q15b$West$p.value
[1] 0.8128969

.

+2

chisq.desc() EnQuireR. . , , ( ), , chisq.desc() . , -, . , .

+1

Source: https://habr.com/ru/post/1530419/


All Articles