ASCII text header binary input read from stdin

Question

ASCII text header binary input read from stdin

I want to read the PNM binary image file from stdin. The file contains a header that is encoded as ASCII text, and a payload that is binary. As a simplified example of reading the header, I created the following snippet:

#! /usr/bin/env python3 import sys header = sys.stdin.readline() print("header=["+header.strip()+"]")

I run it as "test.py" (from the Bash shell), in which case it works just fine:

 $ printf "P5 1 1 255\n\x41" |./test.py header=[P5 1 1 255]

However, a small change in the binary payload violates it:

 $ printf "P5 1 1 255\n\x81" |./test.py Traceback (most recent call last): File "./test.py", line 3, in <module> header = sys.stdin.readline() File "/usr/lib/python3.4/codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 11: invalid start byte

Is there an easy way to make this work in Python 3?

+1

python-3.x binary unicode ascii

nobar Jul 18 '15 at 6:18

source share

2 answers

To read binary data, you must use a binary stream, for example using the TextIOBase.detach() method :

 #!/usr/bin/env python3 import sys sys.stdin = sys.stdin.detach() # convert to binary stream header = sys.stdin.readline().decode('ascii') # b'\n'-terminated print(header, end='') print(repr(sys.stdin.read()))

+2

jfs Jul 19 '15 at 13:15

source share

nobar · Accepted Answer · 2015-07-18 23:51

From documents, you can read binary data (as a bytes type) from stdin using sys.stdin.buffer.read() :

To write or read binary data from / to standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write (b'abc ').

So, this is one direction you can take - read the data in binary mode. readline() and other functions still work. Once you have captured an ASCII string, you can convert it to text using decode('ASCII') for extra text processing.

Alternatively, you can use io.TextIOWrapper() to indicate the use of the latin-1 character set in the input stream. In this case, the implicit decoding operation will essentially go through a pass-through operation, so the data will be of type str (which represent the text), but the data is represented by a 1-to-1 mapping from binary (although it can use more than one storage byte per input byte) .

Here is the code that works in any mode:

 #! /usr/bin/python3 import sys, io BINARY=True ## either way works if BINARY: istream = sys.stdin.buffer else: istream = io.TextIOWrapper(sys.stdin.buffer,encoding='latin-1') header = istream.readline() if BINARY: header = header.decode('ASCII') print("header=["+header.strip()+"]") payload = istream.read() print("len="+str(len(payload))) for i in payload: print( i if BINARY else ord(i) )

Check every possible 1-pixel payload with the following Bash command:

 for i in $(seq 0 255) ; do printf "P5 1 1 255\n\x$(printf %02x $i)" |./test.py ; done

ASCII text header binary input read from stdin

More articles: