Python JSON and Unicode

Update:

I found the answer here: Python UnicodeDecodeError - Am I misunderstanding the encoding?

I needed to explicitly decode my input file in Unicode when I read it. Because it had characters that were not ascii or unicode acceptable. Thus, the encoding did not work when it fell into these characters.

Original question

So, I know something that I just don't get here.

I have an array of unicode strings, some of which contain non-Ascii characters.

I want to encode this as json with

json.dumps(myList) 

It gives an error

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 13: ordinal not in range(128) 

How am I supposed to do this? I tried setting the secure_ascii parameter for both True and False, but does not fix this problem.

I know that I pass unicode strings to json.dumps. I understand that json string is for unicode. Why doesn't he just sort it for me?

What am I doing wrong?

Update: Don Question reasonably suggests providing a stack trace. Here:

 Traceback (most recent call last): File "importFiles.py", line 69, in <module> x = u"%s" % conv File "importFiles.py", line 62, in __str__ return self.page.__str__() File "importFiles.py", line 37, in __str__ return json.dumps(self.page(),ensure_ascii=False) File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps **kw).encode(obj) File "/usr/lib/python2.7/json/encoder.py", line 204, in encode return ''.join(chunks) UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 17: ordinal not in range(128) 

Pay attention to python 2.7, and the error still occurs with security_ascii = False

Update 2: Andrew Walker's useful link (in the comments) makes me think that I can force my data into a convenient byte format before trying json.encode to do something like:

 data.encode("ascii","ignore") 

Unfortunately, this causes the same error.

+6
source share
2 answers

Try adding the argument: ensure_ascii = False . Also, especially if you are asking unicode related questions, it is very useful to add a longer (full) trace and indicate which version of python you are using.

Referring to the python documentation: version 2.6.7 :

"If security_ascii is False (default: True), then some fragments written to fp may be a unicode instance, provided that plain Python str has unicode enforcement rules. If fp.write () does not explicitly understand unicode (as in codecs .getwriter ()) this may cause an error.

Thus, this proposal may cause new problems, but it fixed a similar problem. I passed the resulting unicode-String to a StringIO object and wrote this to a file.

Due to the fact that python 2.7 and sys.getdefaultencoding are set to ascii , the implicit conversion via the ''.join(chunks) instruction of the json-standard library will explode if chunks not ascii-encoded! You must ensure that any contained strings are converted to an ascii compatible representation in front of you! You can try utf-8 encoded strings, but unicode strings will not work if I am not mistaken.

+6
source

Quick tip: how to perform a STRING search (instead of searching for a substring using an index or search) and get rid of Unicode encoding / decoding errors:

For example: you need to find the word "cat" in the following phrase: "Combine string concatenation in Python." How do you do this?

 import re term = 'cat' phrase = 'Learning string concatenation in Python' found = re.search(r'\b' + term + r'\b', phrase) if found: print 'Found!' else: print 'Not found.' 

Explanation : Import imports a regex package. The search method takes a pattern and string as arguments. The template was compiled using the expression '\ b', which means the word boundary. In addition, the prefix r (before expressing the word boundary) implicitly eliminates the encoding problem.

Note I am not familiar with most Python concepts, and there may be some errors in my explanations. However, it works for its intended purpose and that the only intention is to provide a practical example for a simple task in python. I will investigate the insides of this solution, but in the meantime I hope this can help.

-2
source

Source: https://habr.com/ru/post/910643/


All Articles