How to achieve dynamic task load balancing in Apache Spark

I know that in Spark I can separate my calculations using several sections. If I say that I can divide my input RDD into 1000 partitions, and the number of my computers is 100, Spark will divide the calculation into 1000 tasks and dynamically distribute them to my 100 machines in some smart way.

Now suppose I can first split my data into 2 partitions, but I still have 100 machines. Naturally, my 98 cars will be inactive. But since I process each task, I could break it down into subtasks that could potentially be performed on different machines. It can be easily reached in simple Java with a queue , but I'm not sure if this is the best way to attack it in Apache Spark.

Consider the following Java pseudo-code:

BlockingQueue<Task> q = new LinkedBlockingQueue<Task>();
q.push(myInitialTask);
...
//On each thread:
while (!queue.isEmpty()) {
    Task nextTask = queue.take();
    List<Task> newTasks = process_task_and_split_to_sub_tasks(nextTask);
    queue.pushAll(newTasks);
} 

The above Java code will contain all of my 100 threads if the method "process_task_and_split_to_sub_tasks ()" can split any large task into several smaller ones.

Is there a way to achieve this in Spark, maybe in combination with other tools?


Update . It was correctly pointed out that one of the methods of attack is simply

  • Create finer-grained keys and
  • Then use the intelligent Partitioner, which will assign these keys to partitions.

, "" , , , . , ? , , .

: .
, a j (10 ), . 'abcf', , , 50% . . 'ab. * f', {a, b, f, ab, af, bf, abf.
- , "a" (), , "b" .. , Spark. , 100 ( 10 ). 90 .
10 000 , - . : "abcd", , (, , ), , - .

: , , "a", - , "ab ',' ac ',' ad ',... 10 , .
, Apache Spark , , .

(.. , ) + Spark Streaming , , , ? ?

+5
4

Spark , , , , , , Spark . - , .

.

+2

, 100 ( 10 ). , , "a", - , "ab", "ac", "ad" .., 10 .

, Spark. "Mapper" () . SparkContext, RDDs, Iterator , . .

. , , . , , . - Partitioner, "".

+2

- , = 100? 2?

0

Source: https://habr.com/ru/post/1691852/


All Articles