Show sections on pyspark RDD

Pyspark RDD documentation

http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD

does not display any method for displaying partition information for RDD.

Is there a way to get this information without performing an additional step, for example:

myrdd.mapPartitions(lambda x: iter[1]).sum() 

The above works, but seems like an extra effort.

+6
source share
1 answer

I missed this: very simple:

 rdd.getNumPartitions() 

Not used for java-ish to get FooMethod () anymore;)

+17
source

Source: https://habr.com/ru/post/983807/


All Articles