How to write large amount of data in tarfile in python without using a temporary file

I wrote a small cryptographic module in python whose task is to encrypt the file and put the result in a tarfile. The source file for encryption may be terminated, but this is not a problem, because my program should only work with a small block of data at a time, which can be encrypted on the fly and saved.

I am looking for a way to avoid this in two passes, first writing all the data to a temporary file, and then pasting the result into the tarfile.

I basically do the following (where generator_encryptor is a simple generator that gives chunks of data read from the source file).

t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
for chunk in generator_encryptor("sourcefile"):
   tmp.write(chunks)
tmp.close()
t.add(content)
t.close()

I'm a little annoyed that I should use a temporary file as a file, which should be easy to write blocks directly to the tar file, but collect all the pieces in one line and use something like t.addfile ('content', StringIO (bigcipheredstring) seems excluded because I cannot guarantee that I have enough memory for the old bigcipheredstring.

Any hint on how to do this?

+3
source share
4 answers

You can create your own file-like object and go to TarFile.addfile. Your file-like object will generate encrypted content on the fly in the fileobj.read () method.

+4
source

? subprocess tar? , . , , , , , tar .

+2

TarFile.add , .

  • tarfile , , , tarfile, , . , , tarfile.

, -, - . GeneratorEncrypto , . , len, ( , ).

import tarfile

class GeneratorEncryptor(object):
    """Dummy class for testing purpose

       The real one perform on the fly encryption of source file
    """
    def __init__(self, source):
        self.source = source
        self.BLOCKSIZE = 1024
        self.NBBLOCKS = 1000

    def __call__(self):
        for c in range(0, self.NBBLOCKS):
            yield self.BLOCKSIZE * str(c%10)

    def __len__(self):
        return self.BLOCKSIZE * self.NBBLOCKS

class GeneratorToFile(object):
    """Transform a data generator into a conventional file handle
    """
    def __init__(self, generator):
        self.buf = ''
        self.generator = generator()

    def read(self, size):
        chunk = self.buf
        while len(chunk) < size:
            try:
                chunk = chunk + self.generator.next()
            except StopIteration:
                self.buf = ''
                return chunk
        self.buf = chunk[size:]
        return chunk[:size]

t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
generator = GeneratorEncryptor("source")
ti = t.gettarinfo(name = "content")
ti.size = len(generator)
t.addfile(ti, fileobj = GeneratorToFile(generator))
t.close()
+2

Source: https://habr.com/ru/post/1717067/


All Articles