Spark Stage Output Log Interpretation

Question

Spark Stage Output Log Interpretation

When running a spark job in an AWS cluster, I believe that I correctly modified my code to distribute both the data and the algorithm that I use. But the conclusion is as follows:

[Stage 3:>                                                       (0 + 2) / 1000]
[Stage 3:>                                                       (1 + 2) / 1000]
[Stage 3:>                                                       (2 + 2) / 1000]
[Stage 3:>                                                       (3 + 2) / 1000]
[Stage 3:>                                                       (4 + 2) / 1000]
[Stage 3:>                                                       (5 + 2) / 1000]
[Stage 3:>                                                       (6 + 2) / 1000]
[Stage 3:>                                                       (7 + 2) / 1000]
[Stage 3:>                                                       (8 + 2) / 1000]
[Stage 3:>                                                       (9 + 2) / 1000]
[Stage 3:>                                                      (10 + 2) / 1000]
[Stage 3:>                                                      (11 + 2) / 1000]
[Stage 3:>                                                      (12 + 2) / 1000]
[Stage 3:>                                                      (13 + 2) / 1000]
[Stage 3:>                                                      (14 + 2) / 1000]
[Stage 3:>                                                      (15 + 2) / 1000]
[Stage 3:>                                                      (16 + 2) / 1000]

Am I correctly interpreting 0 + 2/1000 as only one dual-core processor that performs one of 1000 tasks at a time? With 5 nodes (10 processors), why don't I see 0 + 10/1000?

+4

stage task apache-spark

user1518003 Jan 15 '16 at 16:56

source share

2 answers

1000 . 1000 2 . ( AWS-), , spark.cores.max . , . , "" .

+1

Aravind R. Yarram 16 . '16 2:57

user1518003 · Accepted Answer · 2016-01-16T07:24:33+0000

This is more like the result I wanted:

[Stage 2:=======>                                             (143 + 20) / 1000]
[Stage 2:=========>                                           (188 + 20) / 1000]
[Stage 2:===========>                                         (225 + 20) / 1000]
[Stage 2:==============>                                      (277 + 20) / 1000]
[Stage 2:=================>                                   (326 + 20) / 1000]
[Stage 2:==================>                                  (354 + 20) / 1000]
[Stage 2:=====================>                               (405 + 20) / 1000]
[Stage 2:========================>                            (464 + 21) / 1000]
[Stage 2:===========================>                         (526 + 20) / 1000]
[Stage 2:===============================>                     (588 + 20) / 1000]
[Stage 2:=================================>                   (633 + 20) / 1000]
[Stage 2:====================================>                (687 + 20) / 1000]
[Stage 2:=======================================>             (752 + 20) / 1000]
[Stage 2:===========================================>         (824 + 20) / 1000]

In AWS EMR, verify that the number of nodes that you use as follows is selected for the -executor-core parameter:

Spark Stage Output Log Interpretation

More articles: