Readinto () replacement?

Copying a file using a direct approach in Python is usually like this:

def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf) 

(This piece of code, by the way, is from shutil.py).

Unfortunately, this has flaws in my special use case (using streaming and very large buffers) [part of italics added later]. Firstly, this means that each call to read () allocates a new piece of memory and when buf is overwritten in the next iteration, this memory is freed, but only to allocate new memory again for the same purpose. This can slow down the whole process and put unnecessary load on the host.

To avoid this, I use the file.readinto () method, which, unfortunately, is documented as deprecated and "not using":

 def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" buffer = array.array('c') buffer.fromstring('-' * length) while True: count = fsrc.readinto(buffer) if count == 0: break if count != len(buffer): fdst.write(buffer.toString()[:count]) else: buf.tofile(fdst) 

My solution works, but there are two drawbacks: first, readinto () is not used. He may leave (says the documentation). Secondly, with readinto () I cannot decide how many bytes I want to read in the buffer and with buffer.tofile () I cannot decide how much I want to write, therefore, a cumbersome special case for the last block (which is also not necessarily expensive )

I looked at the file array.array.fromfile (), but it cannot be used to read "all that is" (reads, then throws an EOFError and does not pass the number of processed elements). Moreover, this is not a solution for the final problem of a special case.

Is there a way to do what I want to do? Perhaps I just don’t notice a simple buffer class or the like, which does what I want.

+2
source share
2 answers

Normal Python code does not need settings like this, however, if you really need all this performance tuning to read files from Python code (for example, you rewrite some server coefficient that you wrote and are already working for performance or memory usage ) I would rather call the OS directly using ctypes - thus, the copy was executed as low as I want.

It is even possible that simply calling the "cp" executable as an external process is less complicated in your case (and it will have the full benefits of all OS-level and file-system optimizations for you).

+2
source

This piece of code from shutil.py

What is the standard library module. Why not just use it?

Firstly, this means that with each call to read () a new piece of memory is allocated and when buf is overwritten in the next iteration, this memory is freed, but only to allocate new memory again for the same purpose. This can slow down the whole process and put unnecessary load on the host.

This is slightly compared to the effort required to actually capture a data page from disk.

+3
source

Source: https://habr.com/ru/post/1240539/


All Articles