Reading a file descriptor from two streams simultaneously

  • my question is : in Linux (and in FreeBsd, and in general in UNIX) is it possible / legal to read one file descriptor simultaneously from two threads?

  • I did a search but did not find anything, although many people ask about reading / writing from / to the fd socket at the same time (which means reading when another thread is writing, and not reading when the other is reading). I also read some man pages and did not get a clear answer to my question.

  • Why am I asking about this. I tried to implement a simple program that counts lines in stdin, for example wc -l. I actually tested my homemade C ++ io engine for overhead and found that wc is 1.7 times faster. I cut back on some C ++ and came close to wc speed, but didn't get to it. Then I experimented with the size of the input buffer, optimized it, but still wc is clearly a bit faster. Finally, I created 2 threads that read the same STDIN_FILENO in parallel, and it finally was faster than wc! But the number of lines became wrong ... so I suppose some kind of junk comes from reading, which is unexpected. Does it care about which process is being read?

Edit: I did some research and found that a call that is read directly through syscall does not change anything. The kernel code seems to do some synchronization processing, but I didn't understand anything (read_write.c)

+4
source share
3 answers

This behavior is undefined, posix says:

The read () function should try to read nbyte bytes from the file associated with the open file descriptor, fildes, to the buffer pointed to by buf. The behavior of multiple concurrent reads on the same channel, FIFO, or terminal device is not defined.

+3
source

When used with the descriptor (fd), read () and write () rely on the internal state of fd to know the "current offset" at which read and write occurs. As a result, they are not thread safe.

So that one descriptor can be used by several threads at the same time, pread () and pwrite () are provided. The descriptor and the required offset are indicated with these interfaces, so the "current offset" is not used in the descriptor.

+1
source

As for the simultaneous access to one file descriptor (that is, from several threads or even processes), I will give POSIX.1-2008 (IEEE Std 1003.1-2008), subsection 2.9.7 Interaction of streams with ordinary file operations :

2.9.7 Stream interaction with regular file operations

All of the following functions must be atomic with respect to each other in the effects specified in POSIX.1-2008 when they work with regular files or symbolic links:

[...] read () [...]

If two threads each call one of these functions, each call should see all of the specified effects of the other call, or none of them. [...]

At first glance, this looks good. However, I hope you have not missed the limit when they work with regular files or symbolic links.

0
source

Source: https://habr.com/ru/post/1340526/


All Articles