How to create an optimized iterator for a long list of integers?

Say I have a very large list of integers that takes up a very large amount of memory. If the integers in the list were uniform, I could easily express the list as an iterator that takes up relatively no memory. But with more complex templates, it would be harder to express this list as an iterator.

Is there a Python package that can parse a list of integers and return a "optimized" iterator? Or methodologies that I can learn to accomplish this?

+6
source share
4 answers

My proof of concept using the lzma library ( backport for python 2 ) with compression in memory. Instead of a memory buffer, you can use a file on disk:

import io
import random
import struct
import sys

from backports import lzma

# Create array of integers with some duplicates
data = []
for i in xrange(0, 2000):
    data += [random.randint(-sys.maxint, sys.maxint)] * random.randint(0, 500)

print('Uncompressed: {}'.format(len(data)))
buff = io.BytesIO()

fmt = 'i'  # check https://docs.python.org/3/library/struct.html#format-characters
lzma_writer = lzma.LZMAFile(buff, 'wb')
for i in data:
    lzma_writer.write(struct.pack(fmt, i))
lzma_writer.close()
print('Compressed: {}'.format(len(buff.getvalue())))

buff.seek(0)
lzma_reader = lzma.LZMAFile(buff, 'rb')

size_of = struct.calcsize(fmt)


def generate():
    r = lzma_reader.read(size_of)
    while len(r) != 0:
        yield struct.unpack(fmt, r)[0]
        r = lzma_reader.read(size_of)


# Test if it is same array
res = list(generate())
print res == data

Result:

Uncompressed: 496225
Compressed: 11568
True
+1
source

I agree with Efron Licht, it is clear: it completely depends on the complexity of a particular list to compactness (not to say "compress"). If your lists are simple enough to express as generators, your only choice is to use Bartek Jablonski's answer.

+1
source

- , . . , ,

def moreDataExists(index):
    # Your stop condition
    return True

def getNextIndex(index):
    # Your complicated pattern of going from one index to the next.
    return index

def generator(yourData):
    index = None
    while moreDataExists(index):
        index = getNextIndex(index)
        yield yourData[index]


for d in generator(data):
    doSomethingWith(d)
-1

- , :

for i in (_ for _ in range(int(1e7)) if str(_)==str(_)[::-1]):
    if str(i**2)==str(i**2)[::-1]:
        print(i)
-2

Source: https://habr.com/ru/post/1016871/


All Articles