Why does pickle.dump (obj) have a different size using sys.getsizeof (obj)? How to save a variable to a file?

I am using the random forest classifier from scikit lib python to do my exercises. The result changes every run time. Therefore, I run 1000 times and get an average result.

I save the rf object to files in order to predict pickle.dump () later and get about 4 MB of each file. However sys.getsizeof (rf) gives me only 36 bytes

rf = RandomForestClassifier(n_estimators = 50) rf.fit(matX, vecY) pickle.dump(rf,'var.sav') 

My questions:

  • sys.getsizeof () seems wrong in getting the size of the RandomForestClassifier object, right? why?
  • How to save an object in a zip file so that it has a smaller size?
0
source share
1 answer

getsizeof() gives you only the amount of memory the object has, and not any other values โ€‹โ€‹referenced by this object. You will need to rewrite the object to find the total size of all attributes, and all those attributes are saved, etc.

Etching is a serialization format. Serialization must store metadata as well as the contents of an object. The size of the memory and the size of the brine have only a rough correlation.

Sockets are byte streams, if you need a more compact stream stream, use compression.

If you store your pickles in a ZIP archive, your data will already be compressed; compressing the brine before storing it in a ZIP will not help in this case, since already compressed data runs the risk of becoming more significant after additional ZIP compression instead because of the overhead of metadata and the absence of duplicate data in typical compressed data.

+5
source

Source: https://habr.com/ru/post/1246804/


All Articles