Panel Data Regression: Reliable Standard Errors

my problem is this: I get NA where I need to get some values ​​when calculating reliable standard errors.

I am trying to do a regression with a fixed effect using standard errors. For this, I follow Arai (2011) , which on p. 3 follows Stock / Watson (2006) (later published in Econometrica , for those with access). I would like to adjust the degrees of freedom by (M/(M-1)*(N-1)/(NK) against the downward shift, since my number of clusters is finite, and I have unbalanced data.

Similar issues were posted before [ 1 , 2 ] on StackOverflow and related issues [ 3 ] on CrossValidated.

Arai (and the answer in the 1st link) uses the following code for functions (below I give my data below):

 gcenter <- function(df1,group) { variables <- paste( rep("C", ncol(df1)), colnames(df1), sep=".") copydf <- df1 for (i in 1:ncol(df1)) { copydf[,i] <- df1[,i] - ave(df1[,i], group,FUN=mean)} colnames(copydf) <- variables return(cbind(df1,copydf))} # 1-way adjusting for clusters clx <- function(fm, dfcw, cluster){ # R-codes (www.r-project.org) for computing # clustered-standard errors. Mahmood Arai, Jan 26, 2008. # The arguments of the function are: # fitted model, cluster1 and cluster2 # You need to install libraries `sandwich' and `lmtest' # reweighting the var-cov matrix for the within model library(sandwich);library(lmtest) M <- length(unique(cluster)) N <- length(cluster) K <- fm$rank dfc <- (M/(M-1))*((N-1)/(NK)) uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum)); vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw coeftest(fm, vcovCL) } 

where gcenter calculates the deviations from the mean (fixed effect). Then I continue and do the regression with DS_CODE , which is my cluster variable (I named my data data).

 centerdata <- gcenter(data, data$DS_CODE) datalm <- lm(C.L1.retE1M ~ C.MCAP_SEC + C.Impact_change + C.Mom + C.BM + C.PD + C.CashGen + C.NITA + C.PE + C.PEdummy + factor(DS_CODE), data=centerdata) M <- length(unique(data$DS_CODE)) dfcw <- datalm$df / (datalm$df - (M-1)) 

and want to calculate

 clx(datalm, dfcw, data$DS_CODE) 

However, when I want to calculate uj (see clx formula above) for variance, I only get at the beginning some values ​​for my regressors, and then a lot of zeros. If this input uj is used for dispersion, the result is only NAs .

My details

Since my data may have a special structure, and I cannot understand the problem, I post it all as a link Hotmail The reason is that with other data (taken from Arai (2011)) my problem does not arise. Sorry in advance for the mess, but I would really appreciate it if you could take a look at it nonetheless. The file is a 5 MB .txt file containing pure data.

+6
source share
2 answers

After some time, it works for me and gives me:

  Estimate Std. Error t value Pr(>|t|) (Intercept) 4.5099e-16 5.2381e-16 0.8610 0.389254 C.MCAP_SEC -5.9769e-07 1.2677e-07 -4.7149 2.425e-06 *** C.Impact_change -5.3908e-04 7.5601e-05 -7.1306 1.014e-12 *** C.Mom 3.7560e-04 3.3378e-03 0.1125 0.910406 C.BM -1.6438e-04 1.7368e-05 -9.4645 < 2.2e-16 *** C.PD 6.2153e-02 3.8766e-02 1.6033 0.108885 C.CashGen -2.7876e-04 1.4031e-02 -0.0199 0.984149 C.NITA -8.1792e-02 3.2153e-02 -2.5438 0.010969 * C.PE -6.6170e-06 4.0138e-06 -1.6485 0.099248 . C.PEdummy 1.3143e-02 4.8864e-03 2.6897 0.007154 ** factor(DS_CODE)130324 -5.2497e-16 5.2683e-16 -0.9965 0.319028 factor(DS_CODE)130409 -4.0276e-16 5.2384e-16 -0.7689 0.441986 factor(DS_CODE)130775 -4.4113e-16 5.2424e-16 -0.8415 0.400089 ... 

This leaves us with a question why this is not for you. I assume this has something to do with the format of your data. Is everything numerical? I converted the column classes and for me it looks like this:

 str(dat) 'data.frame': 48251 obs. of 12 variables: $ DS_CODE : chr "902172" "902172" "902172" "902172" ... $ DNEW : num 2e+05 2e+05 2e+05 2e+05 2e+05 ... $ MCAP_SEC : num 78122 71421 81907 80010 82462 ... $ NITA : num 0.135 0.135 0.135 0.135 0.135 ... $ CashGen : num 0.198 0.198 0.198 0.198 0.198 ... $ BM : num 0.1074 0.1108 0.097 0.0968 0.0899 ... $ PE : num 57 55.3 63.1 63.2 68 ... $ PEdummy : num 0 0 0 0 0 0 0 0 0 0 ... $ L1.retE1M : num -0.72492 0.13177 0.00122 0.07214 -0.07332 ... $ Mom : num 0 0 0 0 0 ... $ PD : num 5.41e-54 1.51e-66 3.16e-80 2.87e-79 4.39e-89 ... $ Impact_change: num 0 -10.59 -10.43 0.7 -6.97 ... 

What returns str(data) for you?

+2
source

The plm package can evaluate clustered SEs for group regressions. The source data is no longer available, so here is an example of using dummy data.

 require(foreign) require(plm) require(lmtest) test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta") fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year')) ##Arellano clustered by *group* SEs > coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC0")) t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.029680 0.066939 0.4434 0.6575 x 1.034833 0.050540 20.4755 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

If you use lm models (instead of plm ), the multiwayvcov package may help.

 library("lmtest") library("multiwayvcov") data(petersen) m1 <- lm(y ~ x, data = petersen) > coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid")], df_correction=FALSE)) t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.029680 0.066939 0.4434 0.6575 x 1.034833 0.050540 20.4755 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

For more details see:

See also:

0
source

Source: https://habr.com/ru/post/916293/


All Articles