Redirecting python stdout to file fails with UnicodeEncodeError

I have a python script that connects to Twitter Firehose and sends data for later processing. Before it worked fine, but now I'm trying to get only body text. (This is not a question of how I should retrieve data from Twitter or how to encode / decode ascii characters). Therefore, when I run my script directly as follows:

python -u fetch_script.py 

It works fine, and I see that messages are coming on the screen. For instance:

 root@domU-xx-xx-xx-xx :/usr/local/streaming# python -u fetch_script.py Cuz I'm checking you out >on Facebook< RT @SearchlightNV: #BarryLies๐Ÿ‘ณ๐ŸŽŒ has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2Aโ€ฆ "Why do men chase after women? Because they fear death."~Moonstruck RT @SearchlightNV: #BarryLies๐Ÿ‘ณ๐ŸŽŒ has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2Aโ€ฆ Never let anyone tell you not to chase your dreams. My sister came home crying today, because someone told her she not good enough. "I can't even ask anyone out on a date because if it doesn't end up in a high speed chase, I get bored." RT @ColIegeStudent: Double-checking the attendance policy while still in bed Well I just handed my life savings to ya.. #trustingyou #abouttomakebankkkkk Zillow $Z and Redfin useless to Wells Fargo Home Mortgage, $WFC, and FannieMae $FNM. Sale history LTV now 48%, $360 appraisal fee 4 no PMI. The latest Dump and Chase Podcast http://t.co/viaRSA9W3i check it out and subscribe on iTunes, or your favorite android app #Isles 

but if I try to output them to a file as follows:

 python -u fetch_script.py >fetch_output.txt 

He imideatly throws me and mistake

 root@domU-xx-xx-xx-xx :/usr/local/streaming# python -u fetch_script.py >fetch_output.txt ERROR:tornado.application:Uncaught exception, closing connection. Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper callback(*args) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json self.parse_response(response) File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response self._callback(response) File "fetch_script.py", line 57, in callback print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128) ERROR:tornado.application:Exception in callback <functools.partial object at 0x187c2b8> Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in _run_callback callback() File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper callback(*args) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped raise_exc_info(exc) File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped ret = fn(*args, **kwargs) File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json self.parse_response(response) File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response self._callback(response) File "fetch_script.py", line 57, in callback print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128) 

PS

A bit more context:

An error occurs in the callback function:

 def callback(self, message): if message: msg = message msg_props = pika.BasicProperties() msg_props.content_type = 'application/text' msg_props.delivery_mode = 2 #print self.count print msg['text'] #self.count += 1 ... 

However, if I delete ['text'] and only print msg will live, both cases work like a charm.

+6
source share
1 answer

Since no one has jumped yet, here is my shot. Python sets the stdout encoding when writing to the console, but not when writing to the file. This script shows the problem.

 import sys msg = {'text':u'\2026'} sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding) print msg['text'] 

Startup shows error

 $ python bad.py>/tmp/xxx default encoding: None Traceback (most recent call last): File "fix.py", line 5, in <module> print msg['text'] UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128) 

Add Encoding

 import sys msg = {'text':u'\2026'} sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding) encoding = sys.stdout.encoding or 'utf-8' print msg['text'].encode(encoding) 

and the problem is resolved.

 $ python good.py >/tmp/xxx default encoding: None $ cat /tmp/xxx 6 
+6
source

Source: https://habr.com/ru/post/955130/


All Articles