Correctly register unicode and utf-8 exceptions in python 2

I am trying to register various library exceptions in python 2.7. I believe that sometimes exceptions contain a unicode string, and sometimes utf8 bytestring. I thought logging.exception(e) was the right approach to logging them, but the following does not seem to work:

 # encoding: utf-8 import logging try: raise Exception('jörn') except Exception as e: logging.exception(e) try: raise Exception(u'jörn') except Exception as e: logging.exception(e) 

saving this file to a file and its launch leads to the following:

 $ python test.py ERROR:root:jörn Traceback (most recent call last): File "test.py", line 4, in <module> raise Exception('jörn') Exception: jörn Traceback (most recent call last): File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 859, in emit msg = self.format(record) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 732, in format return fmt.format(record) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 474, in format s = self._fmt % record.__dict__ UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128) Logged from file test.py, line 12 

So, as you can see, the utf8 exception worked fine, but the unicode exception interrupted the registration, swallowing the real exception and hiding it behind the UnicodeEncodeError .

Is there some kind of standard exception logging tool that won't break my code? What am I missing?

+6
source share
2 answers

Actually, I think I finally found the error and the correct way: I seem to be using logging.exception('msg') all the time. You should not pass an exception, but the message:

 # encoding: utf-8 import logging try: raise Exception('jörn') except Exception as e: logging.exception('exception occurred') try: raise Exception(u'jörn') except Exception as e: logging.exception('exception occurred') 

the execution of the above correctly logs the exception:

 $ python test.py ERROR:root:exception occurred Traceback (most recent call last): File "test.py", line 4, in <module> raise Exception('jörn') Exception: jörn ERROR:root:exception occurred Traceback (most recent call last): File "test.py", line 10, in <module> raise Exception(u'jörn') Exception: j\xf6rn 

The reason for the failure of logging.exception(e) is that it passes the e exception up to logging.Formatter.format() , where it appears as record.message , which is still an Exception object.

Then, on line 474, the following happens:

 s = self._fmt % record.__dict__ 

which is equivalent to the following:

 s = '%(levelname)s:%(name)s:%(message)s' % { 'levelname': 'ERROR', 'name': 'ROOT', 'message': Exception(u'jörn') } 

It turns out, therefore, if message is one of ['jörn', u'jörn', Exception('jörn')] , it works, and not if it is Exception(u'jörn') :

 >>> 'foo %s' % 'jörn' 'foo j\xc3\xb6rn' >>> 'foo %s' % u'jörn' u'foo j\xf6rn' >>> 'foo %s' % Exception('jörn') 'foo j\xc3\xb6rn' >>> 'foo %s' % Exception(u'jörn') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128) 

As you can see, there is an automatic boost that happens for strings in Unicode, and therefore the following works:

 >>> logging.error('jörn') ERROR:root:jörn >>> logging.error(u'jörn') ERROR:root:jörn 

This conversion to unicode fails when you try it with an Exception object that does not properly process the encoding of its message (which, unfortunately, looks like a lot of libraries).

It appears that calling logging.exception(msg) uses repr() to format the exception for logging and prefix it with msg . Therefore, if you did not make a mistake and did not send an exception to logging.exception , it will correctly register it.

Shortly speaking:

Do not use logging.exception(e) , but logging.exception('exception occurred') . It will automatically and correctly add a formatted exception to your log. If you really want to use the exception message without assuming some encoding, you can make logging.exception(repr(e)) most secure.

+4
source

This is not a log that does not handle unicode, it is an Exception .__ str__ method that does not support unicode strings as exception arguments. When you call logging.exception(e) , it will do something like logging.exception(str(e)) , which in turn will do something like str(self.args) on the exception instance. That where the error occurs, your self.args is a unicode string that cannot be encoded in ascii. You have two options: either logging.exception(unicode(e)) , or implement your own exception class, which provides the __str__ method, which can handle unicode objects in self.args.

The reason your first test passes is because the editor encodes the string in UTF-8, and Python sees an instance of the string with encoded Unicode characters.

+3
source

Source: https://habr.com/ru/post/990006/


All Articles