Why do inverse t-distributions for small values differ in Matlab and R?

Question

Why do inverse t-distributions for small values differ in Matlab and R?

I would like to evaluate the inverse student t-distribution function for small values, for example, 1e-18, in Matlab. Degrees of freedom 2.

Unfortunately, Matlab returns NaN :

 tinv(1e-18,2) NaN

However, if I use the built-in R function:

 qt(1e-18,2) -707106781

The result is reasonable. Why can't Matlab evaluate a function for this small value? The results of Matlab and R are very similar to 1e-15, but for smaller values the difference is significant:

 tinv(1e-16,2)/qt(1e-16,2) = 1.05

Does anyone know what the difference is in the implemented Matlab and R algorithms, and if R gives the correct results, how can I efficiently calculate the inverse t-distribution in Matlab with smaller values?

+6

r statistics probability matlab distribution

Arpi Jan 12 '15 at 23:16

source share

1 answer

horchler · Accepted Answer · 2015-01-13T21:38:39+0000

It seems that R qt can use a completely different algorithm than Matlab tinv . I think you and others should report this flaw to The MathWorks by submitting a service request . By the way, in R2014b and R2015a, instead of NaN , -Inf returned for small values (about eps/8 or less) of the first argument p . This is more reasonable, but I think they should do better.

At the same time, there are several workarounds.

Special occasions
Firstly, in the case of t-distribution for students there are several simple analytical solutions for the inverse function of the CDF or for certain integer parameters ν. For your example, ν = 2:

 % for v = 2 p = 1e-18; x = (2*p-1)./sqrt(2*p.*(1-p))

which returns -7.071067811865475e+08 . At a minimum, Matlab tinv should include these special cases (they only do this for ν = 1). This is likely to improve the accuracy and speed of these specific solutions.

Numerical inverse
The tinv function tinv based on the betaincinv function. It seems that it is this function that is responsible for the loss of accuracy for small values of the first argument, p . However, as the OP suggested, you can use the CDF function, tcdf and root search methods to quantify the reverse CDF. The tcdf function tcdf based on betainc , which does not look as sensitive. Using fzero :

 p = 1e-18; v = 2 x = fzero(@(x)tcdf(x,v)-p, 0)

This returns -7.071067811865468e+08 . Note that this method is not very stable for p values close to 1 .

Symbolic decisions
For more general cases, you can use symbolic math and variable precision arithmetic . You can use identities in terms of Gaussian hypergeometric functions , ₂ F ₁ , as here for CDF. Thus, using solve and hypergeom :

 % Supposedly valid for or x^2 < v, but appears to work for your example p = sym('1e-18'); v = sym(2); syms x F = 0.5+x*gamma((v+1)/2)*hypergeom([0.5 (v+1)/2],1.5,-x^2/v)/(sqrt(sym('pi')*v)*gamma(v/2)); sol_x = solve(p==F,x); vpa(sol_x)

The tinv function tinv based on betaincinv . The Symbolic Math or MuPAD toolbar does not have an equivalent function or even an incomplete beta function, but a similar ₂ F ₁ ratio for an incomplete beta function :

 p = sym('1e-18'); v = sym(2); syms x a = v/2; F = 1-x^a*hypergeom([a 0.5],a+1,x)/(a*beta(a,0.5)); sol_x = solve(2*abs(p-0.5)==F,x); sol_x = sign(p-0.5).*sqrt(v.*(1-sol_x)./sol_x); vpa(sol_x)

Both symbolic circuits return results that agree with -707106781.186547523340184 , using the default digits .

I have not fully confirmed the two character methods above, so I cannot vouch for their correctness in all cases. The code also needs to be vectorized and will be slower than a fully numerical solution.

Why do inverse t-distributions for small values ​​differ in Matlab and R?

More articles:

Why do inverse t-distributions for small values differ in Matlab and R?