The relationship between iterable and arrays in Spark

Question

The relationship between iterable and arrays in Spark

I notice that if I applied mapPartitionson RDD, the sections will get an iterable object. Inside the function, mapPartitionsI call the toArrayiteration member function to convert this iterable to an Array. Does the call toArrayinvoke copying or just start to reference the same piece of memory as the array? If this is related to copying, what are the ways to prevent copying?

+4

arrays scala apache-spark

pythonic Dec 21 '16 at 14:01

source share

1 answer

Tim · Accepted Answer · 2016-12-21T14:21:36+0000

One important amendment to your question - the partition data structure that is open in time mapPartitionsis Iterator, not Iterable. Here's the difference in the interface:

Iterator next() hasNext(), . next() ( ).
Iterable Iterator, . , .

, Iterator . , next(). Spark (sc.textFile), .

iterator.toArray, , , . (Spark , ), ( , Int) ( AnyRef, Array[_]). .

, - , , . - GC, , !

The relationship between iterable and arrays in Spark

More articles: