How to separate a string with commas, except for parentheses, using a regular expression?

I want to split the string with a comma:

"a,s".split ',' # => ['a', 's'] 

I do not want to break the substring if it is wrapped with a bracket:

 "a,s(d,f),g,h" 

should give:

 ['a', 's(d,f)', 'g', 'h'] 

Any suggestion?

+6
source share
2 answers

To deal with nested brackets, you can use:

 txt = "a,s(d,f(4,5)),g,h" pattern = Regexp.new('((?:[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)') puts txt.scan(pattern).map &:first 

more details:

 ( # first capturing group (?: # open a non capturing group [^,(]+ # all characters except , and ( | # or ( # open the second capturing group \( # ( (?> # open an atomic group [^()]+ # all characters except parenthesis | # OR \g<-1> # the last capturing group (you can also write \g<2>) )* # close the atomic group \) # ) ) # close the second capturing group )+ # close the non-capturing group and repeat it ) # close the first capturing group 

The second capture group describes a nested parenthesis that may contain characters that are not parentheses or the capture group itself. This is a recursive pattern.

Inside the template, you can refer to the capture group with its number ( \g<2> for the second capture group) or with its relative position ( \g<-1> first to the left of the current position in the template) (or with its name, if you use named capture groups)

Note. You can enable a single bracket by adding |[()] to the end of a non-capturing group. Then a,b(,c will give you ['a', 'b(', 'c']

+10
source

Assuming the brackets are not nested:

 "a,s(d,f),g,h" .scan(/(?:\([^()]*\)|[^,])+/) # => ["a", "s(d,f)", "g", "h"] 
+2
source

Source: https://habr.com/ru/post/952441/


All Articles