How to translate this json format to the correct format that pandas read_json () can use

This is the first time to use stackoverflow to ask a question. I have poor English, so if I accidentally tell you about this, please do not mind.

I have a json file (access.json), format:

[ {u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1', ..... }, {u'IP': u'aaaa2', u'Domain': u'bbbb2', u'Time': u'cccc2', ..... }, {u'IP': u'aaaa3', u'Domain': u'bbbb3', u'Time': u'cccc3', ..... }, {u'IP': u'aaaa4', u'Domain': u'bbbb4', u'Time': u'cccc4', ..... }, { ....... }, { ....... } ] 

When i use:

 ipython import pasdas as pd data = pd.read_json('./access.json') 

it returns:

 ValueError: Expected object or value 

this is the result i want:

 [out] IP Domain Time ... 0 aaaa1 bbbb1 cccc1 ... 1 aaaa2 bbbb2 cccc2 ... 2 aaaa3 bbbb3 cccc3 ... 3 aaaa4 bbbb4 cccc4 ... ...and so on 

How can I do to achieve this? Thanks for the answer!

+6
source share
3 answers

This is not valid json, so read_json will not parse it.

 {u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1', ..... }, 

it should be

 {"IP": "aaaa1", "Domain": "bbbb1", "Time": "cccc1", ..... }, 

You can break this (entire file) into a regular expression to find them, for example:

 In [11]: line Out[11]: "{u'IP': u'aaaa1', u'Domain': u'bbbb1', u'Time': u'cccc1'}," In [12]: re.sub("(?<=[\{ ,])u'|'(?=[:,\}])", '"', line) Out[12]: '{"IP": "aaaa1", "Domain": "bbbb1", "Time": "cccc1"},' 

Note: this will work on some lines, so use with caution.

The best "solution" would be to make sure you had valid json in the first place ... It looks like this came from python str / unicode / repr, not json.dumps .

Note: json.dumps produces the correct json, so it can be read using read_json .

 In [21]: repr({u'IP': u'aaa'}) Out[21]: "{u'IP': u'aaa'}" In [22]: json.dumps({u'IP': u'aaa'}) Out[22]: '{"IP": "aaa"}' 

If someone created this "json" then complain! This is not json.

+5
source

This is not a JSON format. This is a list of dictionaries. You can use ast.literal_eval() to get the actual list from a file and pass it to a DataFrame :

 from ast import literal_eval import pandas as pd with open('./access.log2.json') as f: data = literal_eval(f.read()) df = pd.DataFrame(data) print df 

The output for the example data you provided is:

  Domain IP Time 0 bbbb1 aaaa1 cccc1 1 bbbb2 aaaa2 cccc2 2 bbbb3 aaaa3 cccc3 3 bbbb4 aaaa4 cccc4 
+4
source

You can also use

 pd.read_json("{json_file_name}", orient='records') 

assuming the JSON data is in a list format as shown in the question.

0
source

Source: https://habr.com/ru/post/970553/


All Articles