First of all you do not need transient lazy here. Using the object wrapper is enough to make this work, and you can write it as:
object OnePerExecutor { val obj: NotSerializable = new NotSerializable(10) }
There is a fundamental difference between wrapping an object and initializing NotSerializable inside mapPartitions . It:
rdd.mapPartitions(iter => { val ns = NotSerializable(1) ??? })
creates one instance of NotSerializable for each section.
The object wrapper, on the other hand, creates a single NotSerializable instance for each JVM executor. As a result, this example:
- It can be used to process several partitions.
- Access can be performed simultaneously by several streams of performers.
- Life expectancy exceeds the call to the function where it is used.
This means that it must be thread safe, and any method calls should be free of side effects.
source share