Apache Spark vs Akka

Could you tell me the difference between Apache Spark and AKKA, I know that both structures are designed for programming distributed and parallel computing, but I do not see the link or the difference between them.

In addition, I would like to get use cases suitable for each of them.

+48
parallel-processing akka distributed-computing bigdata apache-spark
Mar 16 '15 at 23:29
source share
3 answers

Apache Spark is actually built on Akka.

Akka is a general-purpose environment for creating reactive, distributed, parallel, and stable parallel applications in Scala or Java. Akka uses the Actor model to hide all flow-related code and gives you very simple and useful interfaces for quickly implementing a scalable and fault-tolerant system. A good example for Akka is a real-time application that consumes and processes data from mobile phones and sends it to some store.

Apache Spark (not Spark Streaming) is an environment for processing packet data using a generalized version of the map reduction algorithm. A good example of Apache Spark is to compute some metrics for stored data to better understand your data. Data is downloaded and processed on demand.

Apache Spark Streaming can perform similar actions and functions on small batches of data in real time just as if they were saved.

UPDATE APRIL 2016

From Apache Spark 1.6.0, Apache Spark no longer relies on Akka to communicate between nodes. Thanks @EugeneMi for the comment.

+85
Mar 17 '15 at 0:42
source share

Spark for data processing is the same as Akka for controlling the flow of data and instructions in an application.

TL; DR

Spark and Akka are two different environments with different use cases and use cases.

When creating applications, distributed or otherwise, you may need to plan and manage tasks using a parallel approach, such as using threads. Imagine a huge application with many threads. How difficult will it be?

TypeSafe (now called Lightbend). The Akka toolkit allows you to use Actor systems (originally derived from Erlang) that give you an abstraction layer over threads. These actors can communicate with each other, transmitting anything and everything in the form of messages, and do things in parallel, without blocking another code.

Akka gives you the cherry on top, giving you ways to run actors in a distributed environment.

Apache Spark, on the other hand, is a data processing environment for massive datasets that cannot be processed manually. Spark uses what we call RDD (or Resilient Distributed Datasets), which is a distributed list like an abstraction layer on your traditional data structures so that operations can be performed on different nodes parallel to each other.

Spark uses the Akka toolkit to schedule tasks between different nodes.

+26
Mar 17 '15 at 13:27
source share

Apache Spark:

Apache Spark ™ is a fast and general mechanism for large-scale data processing.

Spark programs run up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.

Spark provides us with a comprehensive unified structure for managing large data processing requirements with a variety of data sets that are diverse in nature (text data, graph data, etc.), as well as a data source (batch v. Streaming data).

  • Integration with the Hadoop ecosystem and data sources ( HDFS, Amazon S3, Hive, HBase, Cassandra , etc.)

  • It can work on clusters managed by Hadoop YARN or Apache Mesos , and it can also work in standalone mode

  • Provides APIs in Scala, Java, and Python , supporting other languages ​​(e.g. R) on the way

  • In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning, and graphics processing.

We should consider Spark as an alternative to Hadoop MapReduce, not a replacement for Hadoop.

See infoQ and toptal for a better understanding.

Key uses for Spark:

  • Machine Learning Algorithms
  • Interactive analytics
  • Streaming data

Akka: from Letitcrash

Akka is an event-based middleware environment for building high-performance and reliable distributed Java and Scala applications. Akka separates business logic from low-level mechanisms such as threads, locks, and non-blocking IOs. With Akka, you can easily customize how actors are created, destroyed, planned, and restarted after a crash.

Have a look at this typesafe article to better understand the structure of Actor.

Akka provides fault tolerance based on a supervisor hierarchy. Each actor can create other actors, which he will then control, make decisions if they need to be renewed, restarted, retired or if the problem should be increased.

See Akka and SO questions

Key use cases:

  • Transaction processing
  • Concurrency / parallelism
  • Modeling
  • Batch processing
  • Game and Betting
  • Complex event flow processing
+13
Dec 08 '15 at 9:01
source share



All Articles