sys.stdin object is a bit more complicated in Python3 than in Python2. For example, reading from sys.stdin in Python3 by default converts the input to unicode, so it fails for bytes without Unicode:
$ echo -e "\xf8" | python3 -c "import sys; print(sum(1 for _ in sys.stdin))" Traceback (most recent call last): File "<string>", line 1, in <module> File "<string>", line 1, in <genexpr> File "/usr/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Note that Python2 has no problem with this input. Since you can see that Python3 sys.stdin does more things under the hood. I'm not sure if this is exactly responsible for the performance loss, but you can explore it further by trying sys.stdin.buffer under Python3:
import sys print(sum(1 for _ in sys.stdin.buffer))
Note that .buffer does not exist in Python2. I did some tests and I see no real performance difference between Python2 sys.stdin and Python3 sys.stdin.buffer , but YMMV.
EDIT Here are some random results on my machine: ubuntu 16.04, i7 cpu, 8GiB RAM. First, some C code (as a base for comparison):
#include <unistd.h> int main() { char buffer[4096]; size_t total = 0; while (true) { int result = ::read(STDIN_FILENO, buffer, sizeof(buffer)); total += result; if (result == 0) { break; } } return 0; };
now file size:
$ ls -s --block-size=M | grep huge2.txt 10898M huge2.txt
and tests:
// a.out is a simple C equivalent code (except for the final print) $ time cat huge2.txt | ./a.out real 0m20.607s user 0m0.236s sys 0m10.600s $ time cat huge2.txt | python -c "import sys; print(sum(1 for _ in sys.stdin))" 898773889 real 1m24.268s user 1m20.216s sys 0m8.724s $ time cat huge2.txt | python3 -c "import sys; print(sum(1 for _ in sys.stdin.buffer))" 898773889 real 1m19.734s user 1m14.432s sys 0m11.940s $ time cat huge2.txt | python3 -c "import sys; print(sum(1 for _ in sys.stdin))" 898773889 real 2m0.326s user 1m56.148s sys 0m9.876s
So, the file I used was a little smaller, and the times were longer (it seems you have a better machine, and I did not have the patience for large files: D). In any case, Python2 and Python3 sys.stdin.buffer are very similar in my tests. Python3 sys.stdin is slower. And all of them are waaaay behind the C code (which has almost 0 user time).