To get grouped results in R

Here I have a dataset below. In the SAS system, to get grouped results in what I am doing, for example, tools, GLM or REG, I can use like:

proc sort; by B; proc glm; class A; model C=A; by B; run; 

Then I can get the GLM results internally or by level B. But I DO NOT know how to use like'by 'to group in the R system. You might want to suggest that I use > subset () , however this cord will be really complicated if you have, for example, 10 levels B. You might want to recognize me, but only anova analysis, as well as regression and average. Can anybody help me?

 raw data ABC aa 0.47 ab 0.88 ac 2.32 ad 3.26 aa 0.93 ab 1.86 ac 3.22 ad 0.92 aa 0.45 ab 0.92 ac 2.31 ad 3.24 ba 0.91 bb 1.84 bc 3.27 bd 0.86 ba 0.47 bb 0.90 bc 2.33 bd 3.19 ba 0.92 bb 1.84 bc 3.25 bd 0.93 ca 0.45 cb 0.92 cc 2.33 cd 3.08 ca 0.93 cb 1.86 cc 3.25 cd 0.93 ca 0.47 cb 0.90 cc 2.26 cd 3.09 
+4
source share
3 answers

Based on @Brian Diggs answer, but using basic functions of R. I also used Brian's dataset

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) do.call(rbind, lapply(models, function(y) y$coef)) (Intercept) Ab Ac a 0.6166667 0.1500000 1.802585e-17 b 1.2200000 0.3066667 6.666667e-03 c 2.6166667 0.3333333 -3.333333e-03 d 2.4733333 -0.8133333 -1.066667e-01 

Edit 1: Results with P.values ​​(alternative 1)

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) #the same as above Coef <- lapply(models, function(y) y$coef) # the same as above Pval <- lapply(models, function(z) summary(z)$coefficients[, 'Pr(>|t|)']) Result <- cbind(do.call(rbind, Coef), do.call(rbind, Pval)) colnames(Result)[4:6] <- paste('P.val', colnames(Result)[4:6]) Result (Intercept) Ab Ac P.val (Intercept) P.val Ab P.val Ac a 0.6166667 0.1500000 1.802585e-17 0.007088207 0.5167724 1.0000000 b 1.2200000 0.3066667 6.666667e-03 0.008446002 0.5191737 0.9886090 c 2.6166667 0.3333333 -3.333333e-03 0.000151772 0.4762936 0.9941859 d 2.4733333 -0.8133333 -1.066667e-01 0.016803317 0.4744404 0.9235623 

Edit 2: Results with P. values ​​(alternative 2)

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) #the same as above do.call(rbind, lapply(models, function(z) summary(z)$coefficients)) Estimate Std. Error t value Pr(>|t|) (Intercept) 6.166667e-01 0.1540202 4.003804e+00 0.007088207 Ab 1.500000e-01 0.2178175 6.886500e-01 0.516772358 Ac 1.802585e-17 0.2178175 8.275668e-17 1.000000000 (Intercept) 1.220000e+00 0.3167661 3.851423e+00 0.008446002 Ab 3.066667e-01 0.4479749 6.845622e-01 0.519173660 Ac 6.666667e-03 0.4479749 1.488179e-02 0.988608995 (Intercept) 2.616667e+00 0.3103164 8.432253e+00 0.000151772 Ab 3.333333e-01 0.4388537 7.595545e-01 0.476293582 Ac -3.333333e-03 0.4388537 -7.595545e-03 0.994185937 (Intercept) 2.473333e+00 0.7538543 3.280917e+00 0.016803317 Ab -8.133333e-01 1.0661110 -7.628974e-01 0.474440418 Ac -1.066667e-01 1.0661110 -1.000521e-01 0.923562285 
+2
source

Make your dataset easier to import into R:

 dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), B = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L ), .Label = c("a", "b", "c", "d"), class = "factor"), C = c(0.47, 0.88, 2.32, 3.26, 0.93, 1.86, 3.22, 0.92, 0.45, 0.92, 2.31, 3.24, 0.91, 1.84, 3.27, 0.86, 0.47, 0.9, 2.33, 3.19, 0.92, 1.84, 3.25, 0.93, 0.45, 0.92, 2.33, 3.08, 0.93, 1.86, 3.25, 0.93, 0.47, 0.9, 2.26, 3.09)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, -36L)) 

Most of the SAS using group processing of cards is applied to split-apply-comb methods (divide data into parts, do something with each part, partially compose these parts). In this case, the results of the models are objects (lists), and the natural way to "combine" several models is to include them in the list.

 library("plyr") models <- dlply(dat, .(B), function(DF) glm(C~A, data=DF)) 

models now a list, each element of which is the result of a glm on a subset of dim .

 > models $a Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 6.167e-01 1.500e-01 6.799e-17 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 0.472 Residual Deviance: 0.427 AIC: 6.107 $b Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 1.220000 0.306667 0.006667 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 1.99 Residual Deviance: 1.806 AIC: 19.09 $c Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 2.616667 0.333333 -0.003333 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 1.958 Residual Deviance: 1.733 AIC: 18.72 $d Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 2.4733 -0.8133 -0.1067 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 11.4 Residual Deviance: 10.23 AIC: 34.69 attr(,"split_type") [1] "data.frame" attr(,"split_labels") B 1 a 2 b 3 c 4 d 

Extracting information from all models immediately follows the same paradigm:

 > ldply(models, coefficients) B (Intercept) Ab Ac 1 a 0.6166667 0.1500000 6.798700e-17 2 b 1.2200000 0.3066667 6.666667e-03 3 c 2.6166667 0.3333333 -3.333333e-03 4 d 2.4733333 -0.8133333 -1.066667e-01 
+2
source

Perhaps the aggregate function is what you are looking for

 aggregate(data$C, by=list(data$A), FUN=sum) 

This will group your data by the first column and collapse column C into the sum for each group in the first column

0
source

Source: https://habr.com/ru/post/1439439/


All Articles