What is the concept of application, work, stage and task in sparks?

As far as I understand?

  • Application: one spark submit.

  • Work: after a lazy assessment, there is work.

  • stage: This is due to shuffle and type of conversion. It’s hard for me to understand the boundary of the scene.

  • Task: This is a device operation. One conversion per task. One task for each conversion.

Help wanted to improve this understanding.

+5
source share
1 answer

The main function is the application.

When you invoke an action on an RDD, a "job" is created. Work is a work submitted by Spark.

The works are divided into "stages" based on the shuffle border. This will help you understand.

Each step is further divided into tasks based on the number of sections in the RDD. Thus, tasks are the smallest units of work for Spark.

+9
source

Source: https://habr.com/ru/post/1264279/


All Articles