I am new to Spark, and I am trying to understand the output of the log of my stages on my terminal. I work with a very large dataset on my local computer and during actions, I will see something like:
[Stage: 4 ==> (10 + 4) / 200]
I understand that steps are all operations that occur with RDD, but what about the numbers at the end? Do they represent tasks?
(10 + 4) / 200]
10
number of completed tasks?4
number of running parallel tasks (i.e. the number of cores on my machine?)200
total number of tasks for this stage?
source share