Spark for data processing is the same as Akka for controlling the flow of data and instructions in an application.
TL; DR
Spark and Akka are two different environments with different use cases and use cases.
When creating applications, distributed or otherwise, you may need to plan and manage tasks using a parallel approach, such as using threads. Imagine a huge application with many threads. How difficult will it be?
TypeSafe (now called Lightbend). The Akka toolkit allows you to use Actor systems (originally derived from Erlang) that give you an abstraction layer over threads. These actors can communicate with each other, transmitting anything and everything in the form of messages, and do things in parallel, without blocking another code.
Akka gives you the cherry on top, giving you ways to run actors in a distributed environment.
Apache Spark, on the other hand, is a data processing environment for massive datasets that cannot be processed manually. Spark uses what we call RDD (or Resilient Distributed Datasets), which is a distributed list like an abstraction layer on your traditional data structures so that operations can be performed on different nodes parallel to each other.
Spark uses the Akka toolkit to schedule tasks between different nodes.
Chetan Bhasin Mar 17 '15 at 13:27 2015-03-17 13:27
source share