To get grouped results in R

Question

To get grouped results in R

Here I have a dataset below. In the SAS system, to get grouped results in what I am doing, for example, tools, GLM or REG, I can use like:

proc sort; by B; proc glm; class A; model C=A; by B; run;

Then I can get the GLM results internally or by level B. But I DO NOT know how to use like'by 'to group in the R system. You might want to suggest that I use > subset () , however this cord will be really complicated if you have, for example, 10 levels B. You might want to recognize me, but only anova analysis, as well as regression and average. Can anybody help me?

 raw data ABC aa 0.47 ab 0.88 ac 2.32 ad 3.26 aa 0.93 ab 1.86 ac 3.22 ad 0.92 aa 0.45 ab 0.92 ac 2.31 ad 3.24 ba 0.91 bb 1.84 bc 3.27 bd 0.86 ba 0.47 bb 0.90 bc 2.33 bd 3.19 ba 0.92 bb 1.84 bc 3.25 bd 0.93 ca 0.45 cb 0.92 cc 2.33 cd 3.08 ca 0.93 cb 1.86 cc 3.25 cd 0.93 ca 0.47 cb 0.90 cc 2.26 cd 3.09

+4

r grouping sas anova

Lim hyungwoo Oct 12 '12 at 16:41

source share

3 answers

Make your dataset easier to import into R:

 dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), B = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L ), .Label = c("a", "b", "c", "d"), class = "factor"), C = c(0.47, 0.88, 2.32, 3.26, 0.93, 1.86, 3.22, 0.92, 0.45, 0.92, 2.31, 3.24, 0.91, 1.84, 3.27, 0.86, 0.47, 0.9, 2.33, 3.19, 0.92, 1.84, 3.25, 0.93, 0.45, 0.92, 2.33, 3.08, 0.93, 1.86, 3.25, 0.93, 0.47, 0.9, 2.26, 3.09)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, -36L))

Most of the SAS using group processing of cards is applied to split-apply-comb methods (divide data into parts, do something with each part, partially compose these parts). In this case, the results of the models are objects (lists), and the natural way to "combine" several models is to include them in the list.

 library("plyr") models <- dlply(dat, .(B), function(DF) glm(C~A, data=DF))

models now a list, each element of which is the result of a glm on a subset of dim .

 > models $a Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 6.167e-01 1.500e-01 6.799e-17 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 0.472 Residual Deviance: 0.427 AIC: 6.107 $b Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 1.220000 0.306667 0.006667 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 1.99 Residual Deviance: 1.806 AIC: 19.09 $c Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 2.616667 0.333333 -0.003333 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 1.958 Residual Deviance: 1.733 AIC: 18.72 $d Call: glm(formula = C ~ A, data = DF) Coefficients: (Intercept) Ab Ac 2.4733 -0.8133 -0.1067 Degrees of Freedom: 8 Total (ie Null); 6 Residual Null Deviance: 11.4 Residual Deviance: 10.23 AIC: 34.69 attr(,"split_type") [1] "data.frame" attr(,"split_labels") B 1 a 2 b 3 c 4 d

Extracting information from all models immediately follows the same paradigm:

 > ldply(models, coefficients) B (Intercept) Ab Ac 1 a 0.6166667 0.1500000 6.798700e-17 2 b 1.2200000 0.3066667 6.666667e-03 3 c 2.6166667 0.3333333 -3.333333e-03 4 d 2.4733333 -0.8133333 -1.066667e-01

+2

Brian diggs Oct 12 '12 at 17:37

source share

Perhaps the aggregate function is what you are looking for

 aggregate(data$C, by=list(data$A), FUN=sum)

This will group your data by the first column and collapse column C into the sum for each group in the first column

0

by0 Oct 12 '12 at 16:50

source share

Jilber urbina · Accepted Answer · 2012-10-12T18:04:22+0000

Based on @Brian Diggs answer, but using basic functions of R. I also used Brian's dataset

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) do.call(rbind, lapply(models, function(y) y$coef)) (Intercept) Ab Ac a 0.6166667 0.1500000 1.802585e-17 b 1.2200000 0.3066667 6.666667e-03 c 2.6166667 0.3333333 -3.333333e-03 d 2.4733333 -0.8133333 -1.066667e-01

Edit 1: Results with P.values (alternative 1)

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) #the same as above Coef <- lapply(models, function(y) y$coef) # the same as above Pval <- lapply(models, function(z) summary(z)$coefficients[, 'Pr(>|t|)']) Result <- cbind(do.call(rbind, Coef), do.call(rbind, Pval)) colnames(Result)[4:6] <- paste('P.val', colnames(Result)[4:6]) Result (Intercept) Ab Ac P.val (Intercept) P.val Ab P.val Ac a 0.6166667 0.1500000 1.802585e-17 0.007088207 0.5167724 1.0000000 b 1.2200000 0.3066667 6.666667e-03 0.008446002 0.5191737 0.9886090 c 2.6166667 0.3333333 -3.333333e-03 0.000151772 0.4762936 0.9941859 d 2.4733333 -0.8133333 -1.066667e-01 0.016803317 0.4744404 0.9235623

Edit 2: Results with P. values (alternative 2)

 models <- lapply(split(dat, dat$B), function(x) glm(C~A, data=x)) #the same as above do.call(rbind, lapply(models, function(z) summary(z)$coefficients)) Estimate Std. Error t value Pr(>|t|) (Intercept) 6.166667e-01 0.1540202 4.003804e+00 0.007088207 Ab 1.500000e-01 0.2178175 6.886500e-01 0.516772358 Ac 1.802585e-17 0.2178175 8.275668e-17 1.000000000 (Intercept) 1.220000e+00 0.3167661 3.851423e+00 0.008446002 Ab 3.066667e-01 0.4479749 6.845622e-01 0.519173660 Ac 6.666667e-03 0.4479749 1.488179e-02 0.988608995 (Intercept) 2.616667e+00 0.3103164 8.432253e+00 0.000151772 Ab 3.333333e-01 0.4388537 7.595545e-01 0.476293582 Ac -3.333333e-03 0.4388537 -7.595545e-03 0.994185937 (Intercept) 2.473333e+00 0.7538543 3.280917e+00 0.016803317 Ab -8.133333e-01 1.0661110 -7.628974e-01 0.474440418 Ac -1.066667e-01 1.0661110 -1.000521e-01 0.923562285

To get grouped results in R

Edit 1: Results with P.values ​​(alternative 1)

Edit 2: Results with P. values ​​(alternative 2)

More articles:

Edit 1: Results with P.values (alternative 1)

Edit 2: Results with P. values (alternative 2)