Apache Spark or cascading structure?

I am confused about when to use the Cascading framework and when to use Apache Spark. What are the appropriate use cases for each of them?

Any help is appreciated.

+6
source share
1 answer

In the center, Cascading is a higher-level API on top of engines like MapReduce. In this sense, it is similar to Apache Crunch. Cascading has several other related projects, such as the Scala version (Scalding) and the PMML evaluation (template).

Apache Spark is similar in the sense that it provides a high-level API for data pipelines and one that is available in Java and Scala.

This is more likely the execution mechanism than a layer on top of one. He has a number of related projects, such as MLlib, Streaming, GraphX, for ML, stream processing, graph calculation.

All in all, I find Spark much more interesting today, but they are not exactly the same.

+12
source

Source: https://habr.com/ru/post/973643/


All Articles