Understanding spark terminal output in stages

I am new to Spark, and I am trying to understand the output of the log of my stages on my terminal. I work with a very large dataset on my local computer and during actions, I will see something like:

[Stage: 4 ==> (10 + 4) / 200] 

I understand that steps are all operations that occur with RDD, but what about the numbers at the end? Do they represent tasks?

 (10 + 4) / 200] 
  • 10 number of completed tasks?
  • 4 number of running parallel tasks (i.e. the number of cores on my machine?)
  • 200 total number of tasks for this stage?
+6
source share

Source: https://habr.com/ru/post/1011603/


All Articles