Separation of spaces other than certain characters

Question

Separation of spaces other than certain characters

I am parsing a file with lines like

type ("book") title ("golden apples") pages (10-35 70 200-234) comments ("good read")

And I want to break it down into separate fields.

In my example, there are four fields: type, title, pages and comments.

Desired Result After Cleavage

['type ("book")', 'title ("golden apples")', 'pages (10-35 70 200-234)', 'comments ("good read")]

Obviously, simple line splitting will not work, because it will simply be split into each space. I want to break into spaces, but keep something between brackets and quotation marks.

How can I share this?

+3

python string-parsing

That Umbrella Guy Mar 10 '12 at 7:37

source share

3 answers

I would try to use a positive statement.

r'(?<=\))\s+'

Example:

>>> import re
>>> result = re.split(r'(?<=\))\s+', 'type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")')
>>> result
['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments(
"good read")']

+1

dave Mar 10 '12 at 7:51

source share

Divide by ") "and add )to each item except the last.

+1

Karl Knechtel Mar 10 '12 at 7:53

source share

Narendra yadala · Accepted Answer · 2012-03-10T07:43:25+0000

This regex should work for you \s+(?=[^()]*(?:\(|$))

result = re.split(r"\s+(?=[^()]*(?:\(|$))", subject)

Explanation

r"""
\s             # Match a single character that is a "whitespace character" (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=            # Assert that the regex below can be matched, starting at this position (positive lookahead)
   [^()]          # Match a single character NOT present in the list "()"
      *              # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:              # Match the regular expression below
                     # Match either the regular expression below (attempting the next alternative only if this one fails)
         \(             # Match the character "(" literally
      |              # Or match regular expression number 2 below (the entire group fails if this one fails to match)
         $              # Assert position at the end of a line (at the end of the string or before a line break character)
   )
)
"""

Separation of spaces other than certain characters

More articles: