I thought I would bring some more data for discussion.
I conducted a series of tests on this issue.
Using the python resource package, I got memory usage in my process.
And by writing csv in the StringIO buffer, I could easily measure its size in bytes.
I conducted two experiments, each of which created 20 information frames with an increase in size between 10,000 lines and 1,000,000 lines. Both have 10 columns.
In the first experiment, I used only a float in my dataset.
Thus, the memory increased compared to the CSV file depending on the number of lines. (Size in megabytes)

The second experiment had the same approach, but the data in the data set consisted of only short rows.

It seems that the ratio of csv size and data block size can vary quite a lot, but the size in memory will always be 2-3 times larger (for frame sizes in this experiment)
I would like to complete this answer with a lot of experimentation, please comment if you want me to try something special.
firelynx Jul 21 '15 at 15:29 2015-07-21 15:29
source share