How does Serialized RDD take up less memory space?

In the Spark Programming Guide, RDD serialization is mentioned as one of the techniques for reducing memory usage. In my understanding, Serialization is the conversion of an object to bytes, so that the object can be easily stored in storage. So how does it take up less space?

+5
source share
1 answer

In Spark version 2.xx, as mentioned in the memory settings document, Java objects have overhead for raw data, such as a pointer to a class, collections using wrapper objects, or nested objects for collections of primitive types. This overhead is not saved when objects are serialized.

But since the data is stored as a serialized byte array in a section, it will need to be deserialized for use, and this can take a lot of time.

https://spark.apache.org/docs/latest/tuning.html

+6
source

Source: https://habr.com/ru/post/1274451/


All Articles