Spark: unmanaged RDDs for which I lost the link

How can I disable the RDDs that were generated in the MLlib model, for which I have no link?

I know that in pyspark you can disable all data frames with sqlContext.clearCache(), is there something similar, but for RDD in the scala API? Also, is there a way that I could not only cancel some RDDs, but I still would not have to cancel them?

+4
source share
1 answer

You may call

val rdds = sparkContext.getPersistentRDDs(); // result is Map[Int, RDD]

and then filter the values ​​to get that value you want (1):

rdds.filter (x => filterLogic(x._2)).foreach (x => x._2.unpersist())

(1) - written by hand, without a compiler - sorry if there is any error, but it should not be;)

+7
source

Source: https://habr.com/ru/post/1668949/


All Articles