How to measure query execution time on Spark

Question

How to measure query execution time on Spark

I need to measure the execution time of an Apache spark request (Bluemix). What I tried:

import time

startTimeQuery = time.clock()
df = sqlContext.sql(query)
df.show()
endTimeQuery = time.clock()
runTimeQuery = endTimeQuery - startTimeQuery

Is this a good way? The time I get looks too small relative to when I see the table.

+4

sql time apache-spark ibm-cloud

Yakov Jan 6 '16 at 9:11

source share

3 answers

I use System.nanoTimewrapped around a helper function, for example:

def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: "+(System.nanoTime-s)/1e6+"ms")
  ret
}

time {
  df = sqlContext.sql(query)
  df.show()
}

+6

shridharama 07 . '16 23:58

SPARK itself provides a lot of detailed information about each step of your spark mission.

You can view the current job at http: // IP-MasterNode: 4040 or you can turn on the history server for analyzing tasks later.

Learn more about the history server here .

+2

Sumit Jan 6 '16 at 9:48

source share

Sven hafeneger · Accepted Answer · 2016-04-29T10:07:01+0000

On Bluemix on your laptops, go to "Paelette" on the right. Select the "Evironment" panel and you will see a link to the Spark history log, where you can explore Spark's completed assignments, including computation time.

How to measure query execution time on Spark

More articles: