List with 70 MB on disk but 500 MB in memory

I have a list of root python tuples form: lst = [('xxx', 'yyy'), ...etc]. The list has about 8154741tuples. I used the profiler and it says that the list takes about 500 MB in memory. Then I wrote all the tuples in the list to a text file and demanded about 72 MB per disk size.

I have three questions:

  • Why is memory consumption different than disk usage?

  • And is it logical to use 500 MB of memory for such a list?

  • Is there a way / way to reduce the size of the list?

+4
source share
3 answers

8154741, , , 8- , 62 MB . , ascii python2, 124 MB . , , , 8- , 186 MB . 372 MB 46 MB , 3- 2. python3 unicode 1 .

, , , .

, , - numpy string. , . , , numpy .

>>> d = [("xxx", "yyy") for i in range(8154741)]
>>> a = numpy.array(d)
>>> print a.nbytes/1024**2
46
>>> print a[2,1]
yyy
+3

Python , . , Python . .

, . . python ( numpy ?). Cython, (, , ) , , C.

, IOPro, ( , , - ?).

+2

, ? : ? , word2vec

word2vec

Do you really need to store the contents of the string in memory, or can you just convert it to a vector of functions and write the string match to disk?

0
source

Source: https://habr.com/ru/post/1533051/


All Articles