How to measure query execution time on Spark

I need to measure the execution time of an Apache spark request (Bluemix). What I tried:

import time

startTimeQuery = time.clock()
df = sqlContext.sql(query)
df.show()
endTimeQuery = time.clock()
runTimeQuery = endTimeQuery - startTimeQuery

Is this a good way? The time I get looks too small relative to when I see the table.

+4
source share
3 answers

On Bluemix on your laptops, go to "Paelette" on the right. Select the "Evironment" panel and you will see a link to the Spark history log, where you can explore Spark's completed assignments, including computation time.

+3
source

I use System.nanoTimewrapped around a helper function, for example:

def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}

time {
  df = sqlContext.sql(query)
  df.show()
}
+6

SPARK itself provides a lot of detailed information about each step of your spark mission.

You can view the current job at http: // IP-MasterNode: 4040 or you can turn on the history server for analyzing tasks later.

Learn more about the history server here .

+2
source

Source: https://habr.com/ru/post/1623053/


All Articles