Stata to R translation failed

Just stumbled upon a .do file, which I need to translate to R , because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think?

For reproducibility, I am going to translate it into a dataset that I found on the Internet, in particular, a dairy production dataset (p004) , which is part of a textbook from Chatterjee, Hadi and Price.

Here's the Stata code:

 collapse (min) min_protein = protein /// (mean) avg_protein = protein /// (median) median_protein = protein /// (sd) sd_protein = protein /// if protein > 2.8, by(lactatio) 

Here is what I think it does in the data.table syntax:

 library(data.table) library(foreign) DT = read.dta("p004.dta") setDT(DT) DT[protein > 2.8, .(min_protein = min(protein), avg_protein = mean(protein), median_protein = median(protein), sd_protein = sd(protein)), keyby = lactatio] # lactatio min_protein avg_protein median_protein sd_protein # 1: 1 2.9 3.162632 3.10 0.2180803 # 2: 2 2.9 3.304688 3.25 0.2858736 # 3: 3 2.9 3.371429 3.35 0.4547672 # 4: 4 2.9 3.231250 3.20 0.3419917 # 5: 5 2.9 3.855556 3.20 1.9086061 # 6: 6 3.0 3.200000 3.10 0.2645751 # 7: 7 3.3 3.650000 3.65 0.4949748 # 8: 8 3.2 3.300000 3.30 0.1414214 

Is it correct?

This would be easy to confirm if I had used Stata in the last 18 months or if I had a copy installed - hoping I could bend the ear of someone for whom this is true. Thanks.

+6
source share
3 answers

Your intuition is correct. collapse is the Stata equivalent of the R aggregate function, which creates a new dataset from the input dataset by applying an aggregate function (or several aggregate functions, one per variable) to each variable in the dataset.

Here is the output for this Stata command in the sample dataset:

 . list +------------------------------------------------------+ | lactatio min_pr~n avg_pr~n median~n sd_pro~n | |------------------------------------------------------| 1. | 1 2.9 3.162632 3.1 .2180803 | 2. | 2 2.9 3.304688 3.25 .2858736 | 3. | 3 2.9 3.371429 3.35 .4547672 | 4. | 4 2.9 3.23125 3.2 .3419917 | 5. | 5 2.9 3.855556 3.2 1.908606 | |------------------------------------------------------| 6. | 6 3 3.2 3.1 .2645752 | 7. | 7 3.3 3.65 3.65 .4949748 | 8. | 8 3.2 3.3 3.3 .1414214 | +------------------------------------------------------+ 
+1
source

Here is the Stata output for your sample data, which is identical to the data.table :

 collapse (min) min_protein = protein /// (mean) avg_protein = protein /// (median) median_protein = protein /// (sd) sd_protein = protein /// if protein > 2.8, by(lactatio) lactatio min_protein avg_protein median_protein sd_protein 1 2.9 3.162632 3.1 0.2180803 2 2.9 3.304688 3.25 0.2858736 3 2.9 3.371429 3.35 0.4547672 4 2.9 3.23125 3.2 0.3419917 5 2.9 3.855556 3.2 1.908606 6 3 3.2 3.1 0.2645752 7 3.3 3.65 3.65 0.4949748 8 3.2 3.3 3.3 0.1414214 

and here is the output of data.table (just to make sure that I am using the right data)

  library(foreign) #reading Stata data data<-read.dta("p004.dta") setkey(setDT(data),lactatio) setDT(data)[protein>2.8, .(min_protein=min(protein), avg_protein=mean(protein), median_protein=median(protein), sd_protein=sd(protein)), by=lactatio] lactatio min_protein avg_protein median_protein sd_protein 1: 1 2.9 3.162632 3.10 0.2180803 2: 2 2.9 3.304688 3.25 0.2858736 3: 3 2.9 3.371429 3.35 0.4547672 4: 4 2.9 3.231250 3.20 0.3419917 5: 5 2.9 3.855556 3.20 1.9086061 6: 6 3.0 3.200000 3.10 0.2645751 7: 7 3.3 3.650000 3.65 0.4949748 8: 8 3.2 3.300000 3.30 0.1414214 > 
+4
source
 stata.collapse<-function(data,vars,newnames,stat,by) { m=match(by,names(data)) data1=data[m] x=length(by) l=length(stat) for (i in 1:l){ nn=aggregate(data[vars[i]],by=data1,stat[i],na.rm=TRUE) d=names(nn) d[ncol(data1)+1]<-newnames[i] names(nn)<-d xx1=nn[1:x] xx=nn[-(1:x)] if (i>1) { x2=cbind(x2,xx) }else { x2=nn } } return(x2) } 

To run, call this function like this

  h=stata.collapse(roster,c("idcode1","age","age") , c("hhsize","meanage","maxage"),c("max","mean","max"),c("psu","hhno")) 
0
source

Source: https://habr.com/ru/post/989548/


All Articles