Warning - major change:
MLlib is a free collection of high-level algorithms that runs on Spark. It is that Mahu was only Mahu from the old, was on the Hadoop Mapreduce. In 2014, Mahout announced that he would no longer accept the Hadoop Mapreduce code and completely switched the new development to Spark (with other engines, possibly offline, such as H2O).
The most important thing that will come of this is a generator with an extended distributed optimized Scala linear algebra engine, including the Scala interactive shell. Perhaps the most important word is "generalized." Since it runs on Spark, everything available in MLlib can be used with the Mahout-Spark linear algebra engine.
If you need a generic engine that will do a lot of tools like R, but on really big data, look at Mahout. If you need a specific algorithm, look at each one to see what they have. For example, Kmeans runs in MLlib, but if you need to put A'A (the cooccurrence matrix used in the recommendations), you will need both of them because MLlib does not have transposition of the matrix or A'A (in fact, Mahout does subtly optimized A'A, so transposition is optimized).
Mahout also includes several innovative recommender building blocks that offer things found in no other OSS.
Mahout still has its older Hadoop algorithms, but as fast computing engines like Spark become the norm that most people will invest in it.
pferrel May 8 '14 at 6:08 a.m. 2014-05-08 06:08
source share