In Spark version 2.xx, as mentioned in the memory settings document, Java objects have overhead for raw data, such as a pointer to a class, collections using wrapper objects, or nested objects for collections of primitive types. This overhead is not saved when objects are serialized.
But since the data is stored as a serialized byte array in a section, it will need to be deserialized for use, and this can take a lot of time.
https://spark.apache.org/docs/latest/tuning.html
source share