Parsing a multiline JSON file using Python

I am trying to parse a multi-line JSON file using the json library in Python 2.7. The following is a simplified example file:

 { "observations": { "notice": [ { "copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml", "copyright_url": "http://www.bom.gov.au/other/copyright.shtml", "disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml", "feedback_url": "http://www.bom.gov.au/other/feedback" } ] } } 

My code is as follows:

 import json with open('test.json', 'r') as jsonFile: for jf in jsonFile: jf = jf.replace('\n', '') jf = jf.strip() weatherData = json.loads(jf) print weatherData 

However, I am getting an error as shown below:

 Traceback (most recent call last): File "test.py", line 8, in <module> weatherData = json.loads(jf) File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads return _default_decoder.decode(s) File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting object: line 1 column 1 (char 0) 

Just to do some testing, I changed the code so that after deleting new lines and removing spaces in the top and back spaces, I write the contents to another file (with the json extension). Surprisingly, when I read the last file, I get no errors and the parsing succeeded. The modified code is as follows:

 import json filewrite = open('out.json', 'w+') with open('test.json', 'r') as jsonFile: for jf in jsonFile: jf = jf.replace('\n', '') jf = jf.strip() filewrite.write(jf) filewrite.close() with open('out.json', 'r') as newJsonFile: for line in newJsonFile: weatherData = json.loads(line) print weatherData 

The output is as follows:

 {u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}} 

Any idea what might happen when newlines and spaces are removed before using the json library?

+5
source share
3 answers

You will lose your mind if you try to parse the json file line by line. The json module has helper methods for directly reading file objects or lines, i.e. load and loads methods. load accepts an object file (as shown below) for a file containing json data, and loads accepts a string containing json data.

Option 1: - Preferred

 import json with open('test.json', 'r') as jf: weatherData = json.load(jf) print weatherData 

Option 2:

 import json with open('test.json', 'r') as jf: weatherData = json.loads(jf.read()) print weatherData 

If you are looking for better json parsing performance check out ujson

+4
source

In the first snippet, you are trying to parse line by line. You have to make out everything at once. The easiest way is to use json.load(jsonfile) . (The variable name jf is misleading, as it is a string). So, the correct way to parse it:

 import json with open('test.json', 'r') as jsonFile: weatherData = json.loads(jsonFile) 

Although it is a good idea to store json on one line, as it is more concise.

In the second snippet, your problem is that you print it as a unicode string, which is and u'string here' is python specific. Inline json uses double quotes

+5
source

FYI, you can open both files in the same with statement:

 with open('file_A') as in_, open('file_B', 'w+') as out_: # logic here ... 
+1
source

Source: https://habr.com/ru/post/1239195/


All Articles