Why Mahout Doesn't Have Linear Regression Yet

I'm just starting to work with Mahout, and one thing that puzzled me a lot was the lack of linear regression. Even logistic regression, which is much more complicated, is supported to some extent by research, but all this is silent on linear regression!

From what I understand, OLS is one of the easiest problems to solve -

Y = Xb + e

has a linear regression solution b = (X ^ TX) ^ (- 1) X ^ TY, where X ^ T is transposed X, and if the matrix (X ^ TX) turns out to be special (i.e. not reversible), then itโ€™s fine, to show an error message even if there is a solution using a generic converse.

Calculating both X ^ TX and X ^ Y is just a calculation of the sums and sums of the products of the elements, which is probably the easiest MapReduce to use, as I understand it.

(What makes me think ... is there any module that supports its own matrix operators needed to calculate the regression coefficients? This will really make the regression module unnecessary ...)

Am I missing something that makes it difficult to compute regression in Mahout?

+4
source share
2 answers

I do not know if there is a โ€œwhatโ€ for such things. It just doesn't exist.

However, I think this is the opposite of what you assume; it is also "easy." If you do not solve the solution of ten million equations, this is probably not the scale that Hadoop requires. There are many existing packages that can really work well on the same machine. If you want something else in Java from Apache, just look at Commons Math, for example.

This is not to say that the project may not have a beautiful unallocated version, but since the emphasis is mainly large-scale and Hadoop, perhaps the โ€œwhy.โ€

+5
source

I think this is simply because the inversion complexity of the matrix is โ€‹โ€‹NxN - O (N ^ 3) and is subject to numerical instability, which is quite common with sparse high-dimensional matrices.

Does anyone have another explanation or can someone confirm my thoughts?

0
source

Source: https://habr.com/ru/post/1403429/


All Articles