ValueError in json decoding

import json import urllib import re import binascii def asciirepl(match): s = match.group() return binascii.unhexlify(s[2:]) query = 'google' p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te') page = p.read()[2:-10] #As its returned as a function call #To replace hex characters with ascii characters p = re.compile(r'\\x(\w{2})') ascii_string = p.sub(asciirepl, page) #Now decoding cleaned json response data = json.loads(ascii_string) 

By running it, I get this error,

 shadyabhi@archlinux /tmp $ python2 define.py Traceback (most recent call last): File "define.py", line 19, in <module> data = json.loads(ascii_string) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 403 (char 403) 

As far as I think, json has no errors since I got it from google server. All I did was remove hexadecimal characters. Any help would be greatly appreciated.

+4
source share
2 answers

Decoding \ x screens can create β€œlabels” that need to be re-escaped since they appear inside the β€œstrings” encoded in JSON data.

 def asciirepl(match): chr = binascii.unhexlify(match.group()[2:]) return '\\' + chr if chr in ('\\"') else chr 

It will still not process control characters; so instead you want to convert the escape files \ x to \ u screens, which are described in the JSON standard and analyzed by the json module. This has an advantage: easier :)

 def asciirepl(match): return '\\u00' + match.group()[2:] 
+3
source

The 403 character is the first embedded quote in the text - this is invalid json:

 { "type":"url", "text":"<a href="http://www.people-communicating.com/jargon-words.html">http://www.people-communicating.com/jargon-words.html</a>", "language":"en" } 

This is what was returned by the server - note, without inline quotes:

 { "type":"url", "text":"\\x3ca href\\x3d\\x22http://www.people-communicating.com/jargon-words.html\\x22\\x3ehttp://www.people-communicating.com/jargon-words.html\\x3c/a\\x3e", "language":"en" } 

The best way to do this is to decode json first and then de-hexify each line as needed.

EDIT: If this is really invalid JSON, as Karl Knechtel says in the comments, Google should say that their API is incorrect. If the Python implementation is not allowed on valid JSON, they should be asked to fix it. Regardless of the workaround, you should be able to easily remove it if fixed.

+2
source

Source: https://habr.com/ru/post/1379665/


All Articles