Python split string in parentheses

I asked a little question ( Python, separating the unknown string with spaces and parentheses ), which worked fine until I had to change my way of thinking. I still missed the regex, so I need help with this.

If the user types this:

new test (test1 test2 test3) test "test5 test6"

I would like it to look like the output of a variable like this:

["new", "test", "test1 test2 test3", "test", "test5 test6"]

In other words, if this is one word separated by a space, then split it from the next word, if it is in parentheses, then separate the whole group of words in parentheses and delete them. The same goes for quotation marks.

I am currently using this code that does not meet the above standard (from the answers in the link above):

 >>>import re >>>strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff" >>>[", ".join(x.split()) for x in re.split(r'[()]',strs) if x.strip()] >>>['Hello', 'Test1, test2', 'Hello1, hello2', 'other_stuff'] 

This works well, but there is a problem if you have this:

strs = "Hello Test (Test1 test2) (Hello1 hello2) other_stuff"

It combines Hello and Test as one split instead of two.

It also does not allow the use of parentheses and quotation marks at the same time.

+4
source share
4 answers

The answer was simple:

 re.findall('\[[^\]]*\]|\([^\)]*\)|\"[^\"]*\"|\S+',strs) 
+4
source

Your problem is not defined.

Your description of the rules

In other words, if this is one word separated by a space, then break it from the next word, if it is in parentheses, then separate the entire group of words in brackets and delete them. The same goes for commas.

I think with commas you mean inverted commas == quotes.

Then with this

 strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff" 

you should get it

 ["Hello (Test1 test2) (Hello1 hello2) other_stuff"] 

since everything is surrounded by quotation marks. Most likely, you want to work without worrying about large inverted commas.

I suggest this though the bot is ugly

 import re, itertools strs = raw_input("enter a string list ") print [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) for x in re.split(r'\((.*)\)', strs)])) if y <> ''] 

gets

 >>> enter a string list here there (xy ) thereagain "there there" ['here there ', 'xy ', ' thereagain ', 'there there'] 
+1
source

It does what you expect

 import re, itertools strs = raw_input("enter a string list ") res1 = [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) for x in re.split(r'\((.*)\)', strs)])) if y <> ''] set1 = re.search(r'\"(.*)\"', strs).groups() set2 = re.search(r'\((.*)\)', strs).groups() print [k for k in res1 if k in list(set1) or k in list(set2) ] + list(itertools.chain(*[k.split() for k in res1 if k not in set1 and k not in set2 ])) 
+1
source

This pushes what regular expressions can do. Use pyparsing . This is a recursive descent. For this task you can use:

 from pyparsing import * import string, re RawWord = Word(re.sub('[()" ]', '', string.printable)) Token = Forward() Token << ( RawWord | Group('"' + OneOrMore(RawWord) + '"') | Group('(' + OneOrMore(Token) + ')') ) Phrase = ZeroOrMore(Token) Phrase.parseString(s, parseAll=True) 

It is robust against strange spaces and handles nested brackets. It is also a little readable than a large regex, and therefore easier to customize.

I understand that you have solved your problem a long time ago, but this is one of the highest pages in the ratings related to Google for such problems, and pyparsing is an underrated library.

0
source

Source: https://habr.com/ru/post/1488577/


All Articles