How to read inputs from stdin and apply encoding?

The goal is to read continuously from stdin and enforce utf8 compliance in both Python2 and Python3.

I tried the solutions:

  • Writing bytes to standard output in a format compatible with both python2 and python3
  • Python 3: How to specify stdin encoding

I tried:

 #!/usr/bin/env python from __future__ import print_function, unicode_literals import io import sys # Supports Python2 read from stdin and Python3 read from stdin.buffer # https://stackoverflow.com/a/23932488/610569 user_input = getattr(sys.stdin, 'buffer', sys.stdin) # Enforcing utf-8 in Python3 # https://stackoverflow.com/a/16549381/610569 with io.TextIOWrapper(user_input, encoding='utf-8') as fin: for line in fin: # Reads the input line by line # and do something, for eg just print line. print(line) 

The code works in Python3, but in Python2, TextIOWrapper does not have a read function, and it throws:

 Traceback (most recent call last): File "testfin.py", line 12, in <module> with io.TextIOWrapper(user_input, encoding='utf-8') as fin: AttributeError: 'file' object has no attribute 'readable' 



This is because in Python user_input , i.e. sys.stdin.buffer is a _io.BufferedReader object and its attribute is readable :

 <class '_io.BufferedReader'> ['__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_dealloc_warn', '_finalizing', 'close', 'closed', 'detach', 'fileno', 'flush', 'isatty', 'mode', 'name', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readinto1', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines'] 

While in Python2 user_input is a file object, and its attributes are not readable :

 <type 'file'> ['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines'] 
+2
python file io stdin utf-8
Nov 22 '17 at 2:01
source share
2 answers

If you do not need a full io.TextIOWrapper , but just a decoded stream for reading, you can use codecs.getreader() to create a decoding shell:

 reader = codecs.getreader('utf8')(user_input) for line in reader: # do whatever you need... print(line) 

codecs.getreader('utf8') creates a factory for codecs.StreamReader , which is then created using the source stream. I'm not sure if StreamReader supports a with context, but it may not be necessary (there is no need to close STDIN after reading, I think ...).

I have successfully used this solution in situations where the underlying thread only offers a very limited interface.

Update (second version)

From the comments, it became clear that you really need io.TextIOWrapper for proper line buffering, etc. in interactive mode; codecs.StreamReader only works for input channels, etc.

Using this answer , I was able to work correctly with interactive inputs:

 #!/usr/bin/env python # coding: utf8 from __future__ import print_function, unicode_literals import io import sys user_input = getattr(sys.stdin, 'buffer', sys.stdin) with io.open(user_input.fileno(), encoding='utf8') as f: for line in f: # do whatever you need... print(line) 

This creates a force-encoded io.TextIOWrapper from the binary STDIN buffer.

+1
Nov 22 '17 at 11:05
source share

You tried to force utf-8 encoding in python as follows:

 import sys reload(sys) sys.setdefaultencoding('utf-8') 
-one
Nov 27 '17 at 8:12
source share



All Articles