For people who are considering jumping on this car, here is some code that demonstrates the problem (data here ):
clear set more off capture ssc install bcuse capture ssc install rsource bcuse affairs saveold affairs, version(12) replace rsource, terminator(XXX) library("foreign") library("lmtest") library("sandwich") mydata<-read.dta("affairs.dta") probit1<-glm(affair ~ male + age + yrsmarr + kids + relig + educ + ratemarr, family = binomial (link = "probit"), data = mydata) sandwich1 <- function(object,...) sandwich(object) * nobs(object)/(nobs(object) - 1) coeftest(probit1,vcov = sandwich1) XXX probit affair male age yrsmarr kids relig educ ratemarr, robust cformat(%9.6f) nolog
R gives:
z test of coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.764157 0.546692 1.3978 0.1621780 male 0.188816 0.133260 1.4169 0.1565119 age -0.024400 0.011423 -2.1361 0.0326725 * yrsmarr 0.054608 0.019025 2.8703 0.0041014 ** kids 0.208072 0.168222 1.2369 0.2161261 relig -0.186085 0.053968 -3.4480 0.0005647 *** educ 0.015506 0.026389 0.5876 0.5568012 ratemarr -0.272711 0.053668 -5.0814 3.746e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Stata gives:
Probit regression Number of obs = 601 Wald chi2(7) = 54.93 Prob > chi2 = 0.0000 Log pseudolikelihood = -305.2525 Pseudo R2 = 0.0961 ------------------------------------------------------------------------------ | Robust affair | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | 0.188817 0.131927 1.43 0.152 -0.069755 0.447390 age | -0.024400 0.011124 -2.19 0.028 -0.046202 -0.002597 yrsmarr | 0.054608 0.018963 2.88 0.004 0.017441 0.091775 kids | 0.208075 0.166243 1.25 0.211 -0.117754 0.533905 relig | -0.186085 0.053240 -3.50 0.000 -0.290435 -0.081736 educ | 0.015505 0.026355 0.59 0.556 -0.036150 0.067161 ratemarr | -0.272710 0.053392 -5.11 0.000 -0.377356 -0.168064 _cons | 0.764160 0.534335 1.43 0.153 -0.283117 1.811437 ------------------------------------------------------------------------------
Addendum:
The difference in the covariance estimation of the coefficients is due to different fitting algorithms. In R, the glm command uses the iterative least square method, and the Stata probit uses the ML method, based on the Newton-Raphson algorithm. You can map what R does with glm in Stata with the irls option:
glm affair male age yrsmarr kids relig educ ratemarr, irls family(binomial) link(probit) robust
This gives:
Generalized linear models No. of obs = 601 Optimization : MQL Fisher scoring Residual df = 593 (IRLS EIM) Scale parameter = 1 Deviance = 610.5049916 (1/df) Deviance = 1.029519 Pearson = 619.0405832 (1/df) Pearson = 1.043913 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = invnorm(u) [Probit] BIC = -3183.862 ------------------------------------------------------------------------------ | Semirobust affair | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | 0.188817 0.133260 1.42 0.157 -0.072367 0.450002 age | -0.024400 0.011422 -2.14 0.033 -0.046787 -0.002012 yrsmarr | 0.054608 0.019025 2.87 0.004 0.017319 0.091897 kids | 0.208075 0.168222 1.24 0.216 -0.121634 0.537785 relig | -0.186085 0.053968 -3.45 0.001 -0.291862 -0.080309 educ | 0.015505 0.026389 0.59 0.557 -0.036216 0.067226 ratemarr | -0.272710 0.053668 -5.08 0.000 -0.377898 -0.167522 _cons | 0.764160 0.546693 1.40 0.162 -0.307338 1.835657 ------------------------------------------------------------------------------
They will be close, although not identical. I'm not sure how to get R to use something like NR without a lot of work.
source share