R: avoid summary.plm

I use R to run Monte Carlo simulations studying the performance of panel data estimates. Since I will have a large number of samples, I need to get at least decent performance from my code.

Using Rprof in 10 tests of my simulation, it is shown that a significant part of the time is spent on calls to summary.plm . The first few lines of Rprofsummary are shown below:

 $by.total total.time total.pct self.time self.pct "trial" 54.48 100.0 0.00 0.0 "coefs" 53.90 98.9 0.06 0.1 "model.matrix" 36.72 67.4 0.10 0.2 "model.matrix.pFormula" 35.98 66.0 0.06 0.1 "summary" 33.82 62.1 0.00 0.0 "summary.plm" 33.80 62.0 0.08 0.1 "r.squared" 29.00 53.2 0.02 0.0 "FUN" 24.84 45.6 7.52 13.8 

I call summary in my code because I need to get standard error estimates of the coefficients, as well as the coefficients themselves (which I could only get from the plm object). My call looks like

 regression <- plm(g ~ y0 + Xit, data=panel_data, model=model, index=c("country","period")) coefficients_estimated <- summary(regression)$coefficients[,"Estimate"] ses_estimated <- summary(regression)$coefficients[,"Std. Error"] 

I feel this is a huge waste of CPU time, but I don’t know enough about how R does everything to avoid calling a resume. I would appreciate any information on what is going on behind the scenes here, or some way to reduce the time spent on this.

+6
source share
3 answers

You just need to look inside plm:::summary.plm to see what it does. When you do this, you will see that your two lines calling summary() on your model can be replaced by:

 coefficients_estimated <- coef(regression) ses_estimated <- sqrt(diag(vcov(regression))) 

For instance:

 require(plm) data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) 

summary(zz) gives:

 > summary(zz) Oneway (individual) effect Within Model .... Coefficients : Estimate Std. Error t-value Pr(>|t|) log(pcap) -0.02614965 0.02900158 -0.9017 0.3675 log(pc) 0.29200693 0.02511967 11.6246 < 2.2e-16 *** log(emp) 0.76815947 0.03009174 25.5273 < 2.2e-16 *** unemp -0.00529774 0.00098873 -5.3582 1.114e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 .... 

and the two lines I showed returned for zz :

 > coef(zz) log(pcap) log(pc) log(emp) unemp -0.026149654 0.292006925 0.768159473 -0.005297741 > sqrt(diag(vcov(zz))) log(pcap) log(pc) log(emp) unemp 0.0290015755 0.0251196728 0.0300917394 0.0009887257 

You really do not provide enough information (for example, simulation code or the full output from Rprof() ) to say whether this will help - it certainly does not seem like a huge amount of time is spent on summary() ; FUN much more expensive than anything else you show, and of the elements you show, r.squared() is the only one that appears in plm:::summary.plm() , and it looks like this not enough time.

So, is it worth noting that the above speeds up the work.

+6
source

If you want to do something else, look at the actual function code plm:::plm . You will notice that before the final call to plm:::plm.fit you can (if you really want to)) skip directly to plm.fit .

One last moment. You mentioned that your problem is the Monte Carlo simulation. Can you use parallel computing to increase speed?

+2
source

Just use coeftest(zz) . coeftest is in the lmtest package; it will give you the coefficients and standard errors from plm objects much faster than summary.plm .

+1
source

Source: https://habr.com/ru/post/885626/


All Articles