What are the ThreadPoolExecutors jobs in the web interface?

Question

What are the ThreadPoolExecutors jobs in the web interface?

I run several Spark jobs using Spark SQL 1.6.1. Looking at the spark interface, I see that there are some jobs with the description "run at ThreadPoolExecutor.java:1142"

I was wondering why some jobs get this description?

+5

apache-spark apache-spark-sql

Gideon Nov 28 '16 at 20:09

source share

1 answer

Jacek laskowski · Accepted Answer · 2016-11-29T20:51:38+0000

After some investigation, I found that it was executing in ThreadPoolExecutor.java:1142 . Spark's tasks are related to queries with join statements.

 scala> spark.version res16: String = 2.1.0-SNAPSHOT scala> val left = spark.range(1) left: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> val right = spark.range(1) right: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> left.join(right, Seq("id")).show +---+ | id| +---+ | 0| +---+

When you switch to the SQL tab, you will see the Completed Queries and Their Jobs section (on the right).

In my case, Sparks work running on "run at ThreadPoolExecutor.java:1142", where ids are 12 and 16.

Both of them correspond to join requests.

If you are wondering: "it makes sense that one of my connections causes this task, but as far as I know, the union is a shuffle, not an action, so why is the task described in ThreadPoolExecutor and not with my action (as in the case with the rest of my assignments)? ", then my answer usually goes line by line:

Spark SQL is an extension of Spark with its own abstractions ( Dataset , to name just one that quickly comes to mind), which have their own statements to execute. One “simple” SQL operation can run one or more Spark jobs. It runs at the discretion of Spark SQL how many Spark jobs are started or sent (but they use RDD under covers) - you don’t need to know details with a low level like this ... well ... too low-level ... considering that you’re at such a high level using Spark SQL SQL or Query DSL.

What are the ThreadPoolExecutors jobs in the web interface?

More articles: