How can I ask PIPE how many bytes are readable?

I have implemented a non-blocking reader in Python and I need to make it more efficient.

Background: I have a massive amount of output that I need to read from one subprocess (started with Popen ()) and go to another thread. Reading the output from this subprocess should not be blocked for more than a few ms (preferably within as little time as possible to read the available bytes).

I currently have a utility class that accepts a file descriptor (stdout) and a timeout. I select() and readline(1) until one of three events occurs:

  • I read a new line
  • my timeout (several ms) expires
  • select tells me not to read anything in this file descriptor.

Then I return the buffered text to the calling method, which does something with it.

Now, for the real question: because I read so many results, I need to make it more efficient. I would like to do this by requesting a file descriptor how many bytes are expected, and then readline([that many bytes]) . It should just convey material, so it doesn't matter to me where the new lines are, or even if they are. Can I ask the file descriptor how many bytes it has for reading, and if so, how?

I have done several searches, but it is very difficult for me to understand what to look for, not to mention whether this is possible.

Even a useful point in the right direction would be useful.

Note. I am developing Linux, but it does not matter for the Pythonic solution.

+6
source share
2 answers

On Linux, os.pipe() is just a wrapper around a pipe (2). Both return a pair of file descriptors. Typically, one lseek (2) ( os.lseek() in Python) would use decsriptor file offset permutation as a way to get the amount of data available. However, not all file descriptors are capable of searching.

On Linux, trying lseek (2) on the pipe will return an error, see the man page. This is because the pipe is more or less a buffer between the producer and the consumer of the data. The size of this buffer is system dependent.

On Linux, the channel has a buffer of 64 kB , so this is the most accessible data.

Change If you can change the way your subprocess works, you can use a memory mapped file or a nice big chunk of shared memory.

Edit2. Using survey objects is probably faster than choosing.

+5
source

This question seems to offer a possible solution, although it may require re-equipment.

Non-blocking reading on .PIPE subprocess in python

Otherwise, I suppose you know about reading data by N bytes at a time:

 all_data = '' while True: data = pipe.read(1024) # Reads 1024 bytes or to end of pipe if not data: break all_data += data # Add your timeout break here 
0
source

Source: https://habr.com/ru/post/958416/


All Articles