Spark sql temp table caching

Is the table registered in registerTempTable( createOrReplaceTempViewwith spark 2. + ) cached?

Using Zeppelin, I register DataFramein my scala code, after heavy calculations, and then in %pysparkI want to access it and additionally filter it.

Will he use a cached version of the table? Or will it be rebuilt every time?

+13
source share
2 answers

Registered tables are not cached in memory.

The method will simply create or replace the representation of the given plan with the request.registerTempTable createOrReplaceTempViewDataFrame

SQL metastore, .

DataFrame. :

df.createOrReplaceTempView("my_table") # df.registerTempTable("my_table") for spark <2.+
spark.cacheTable("my_table") 

EDIT:

:

cacheTable :

scala> val df = Seq(("1",2),("b",3)).toDF
// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> sc.getPersistentRDDs
// res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> df.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> spark.catalog.cacheTable("my_table") // spark.cacheTable("...") before spark 2.0

scala> sc.getPersistentRDDs
// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(2 -> In-memory table my_table MapPartitionsRDD[2] at cacheTable at <console>:26)

cache.registerTempTable cache.createOrReplaceTempView :

scala> sc.getPersistentRDDs
// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> val df = Seq(("1",2),("b",3)).toDF
// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> df.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> df.cache.createOrReplaceTempView("my_table")

scala> sc.getPersistentRDDs
// res6: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = 
// Map(2 -> ConvertToUnsafe
// +- LocalTableScan [_1#0,_2#1], [[1,2],[b,3]]
//  MapPartitionsRDD[2] at cache at <console>:28)
+20

. :

sqlContext.cacheTable("someTable")
+4

Source: https://habr.com/ru/post/1653138/


All Articles