How does Serialized RDD take up less memory space?

Question

How does Serialized RDD take up less memory space?

In the Spark Programming Guide, RDD serialization is mentioned as one of the techniques for reducing memory usage. In my understanding, Serialization is the conversion of an object to bytes, so that the object can be easily stored in storage. So how does it take up less space?

+5

java serialization apache-spark

user2017 Jan 03 '18 at 1:05

source share

1 answer

user3401493 · Accepted Answer · 2018-01-03T01:22:20+0000

In Spark version 2.xx, as mentioned in the memory settings document, Java objects have overhead for raw data, such as a pointer to a class, collections using wrapper objects, or nested objects for collections of primitive types. This overhead is not saved when objects are serialized.

But since the data is stored as a serialized byte array in a section, it will need to be deserialized for use, and this can take a lot of time.

https://spark.apache.org/docs/latest/tuning.html

How does Serialized RDD take up less memory space?

More articles: