How to specify RDD defined in Spark shell?

In both the “spark” or “pyspark” shells, I created a lot of RDD, but could not find a way through which I can list all the available RDDs in the current Spark Shell session?

+5
source share
1 answer

In Python, you can simply filter globals by type:

 def list_rdds(): from pyspark import RDD return [k for (k, v) in globals().items() if isinstance(v, RDD)] list_rdds() # [] rdd = sc.parallelize([]) list_rdds() # ['rdd'] 

In Scala REPL, you can use $intp.definedTerms / $intp.typeOfTerm similar way.

+6
source

Source: https://habr.com/ru/post/1236485/


All Articles