How to fix regex to match my string

I have the following line. key1=[subKey1=[val1,val2=[k1,k2]],val3,val4,subKey2=[aaa,bbb]],key2=val5,key3,key4=[1,2,3]I need to analyze this line and process the data found in a loop.

I wrote this regexp: (([^=]+)=(\[(\S+)\],?|[a-z0-9-_]+))|([a-z0-9-_]+)but it cannot capture the part key1because the expression key4ends with a character ], how can I fix my regular expression to match the string? \

regex=re.compile('(([^=]+)=(\[(\S+)\],?|[a-z0-9-_]+))|([a-z0-9-_]+)')
string="key1=[subKey1=[val1,val2=[k1,k2]],val3,val4,subKey2=[aaa,bbb]],key2=val5,key3,key4=[1,2,3]"

for i in regex.findall(string):
   #Do Stuff
+4
source share
3 answers

Regexp is not suitable for analyzing everything that has a recursive template in it. Instead, use a true context-sensitive parser. Otherwise, you should strengthen your language in a much more convenient format.

JSON JSON.

+2
import regex
x="key1=[subKey1=[val1,val2=[k1,k2]],val3,val4,subKey2=[aaa,bbb]],key2=val5,key3,key4=[1,2,3]"
print [i for i,j in regex.findall("([^,=]+=(\[(?:[^\[\]]|(?2))+\])|[^,]*)",x) if i]

recursive regex regex.

: ['key1=[subKey1=[val1,val2=[k1,k2]],val3,val4,subKey2=[aaa,bbb]]', 'key2=val5', 'key3', 'key4=[1,2,3]']

+1

Here is a somewhat alternative approach that uses the Python function ast.literal_eval :

import ast, re

orig_text = """key1=[subKey1=[val1,val2=[k1,k2]],val3,val4,subKey2=[aaa,bbb]],key2=val5,key3,key4=[1,2,3]"""
quoted_values = re.sub(r'([a-zA-Z0-9]+)', r'"\1"', orig_text)
assignments_removed = re.sub(r'("[a-zA-Z0-9]+?"\s?=\s*)', '', quoted_values)

print ast.literal_eval(assignments_removed)

This will at least give you all the values ​​as follows:

([['val1', ['k1', 'k2']], 'val3', 'val4', ['aaa', 'bbb']], 'val5', 'key3', ['1', '2', '3'])

This works by first quoting all values ​​and then deleting all assignments to allow literal_evalwork. The structure is preserved.

0
source

Source: https://habr.com/ru/post/1606065/


All Articles