I am trying to parse the code of JavaScript objects containing huge JavaScript arrays and convert them to a Python dictionary with lists.
I am currently using PyYaml, but this does not work directly, since it cannot process consecutive commas (for example, it breaks into "[,, 0,]" with: expected contents of node, but found ','). So I replaced them, but all this is very slow. I am wondering if any of you know the best and fastest way to do this. JSON decoding does not work as JavaScript code is not JSON valid.
This is the code I use described above with the js_obj example:
js_obj = "{index: '37',data: [, 1, 2, 3,,,]}" def repl(match): content = re.sub(" ", "",match.group(0)) length = len(content) - 1 result = '' if content[0] == '[': result = '[""' length -= 1 after = ',' if content[-1] == ']': length -= 1 after += '""]' return result + (',""' * length) + after py_dict = yaml.load(re.sub('\[? *(, *)+\]?', repl, js_obj))
source share