Pyspark RDD documentation
http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD
does not display any method for displaying partition information for RDD.
Is there a way to get this information without performing an additional step, for example:
myrdd.mapPartitions(lambda x: iter[1]).sum()
The above works, but seems like an extra effort.
I missed this: very simple:
rdd.getNumPartitions()
Not used for java-ish to get FooMethod () anymore;)
Source: https://habr.com/ru/post/983807/More articles:With Elixir Ecto, how to add has_many relationships in migration? - elixirpgbackups on Heroku not found - postgresqlA similar trick for git bisect WITHING commit - gitFinding the sum x ^ k from k = 0 to n in O (logn) time - crecyclerview element extension - androidchrome hijack escape key for vim mode in cloud9 IDE - vimcannot create Laravel project because mcrypt extension is missing - phpSplit a large array into two arrays - arraysIs S-PLUS dead? - rPython - SystemError: NULL result without errors in a PyObject call - cAll Articles