Regex - replace specific characters except a specific string

I find it difficult to determine how to replace every white space '', except for the thousands that appear between " ".

For instance -

a = c + d;

is an

a=c+d

and

foo ("hi bye",        "bye    hi");

is an

foo("hi bye","bye    hi");

I tried something like

re.sub('^(\"[^\"\n]*\")|\s|\\n', '', line)

but obviously this does not work.

+4
source share
2 answers

Search:

r'(".*?")|(\s+)'

Replace:

r'\1'

The idea is to ignore all the characters inside the quotes, first matching all the quotes with something inside ( ".*?") and replacing them with the same ( \1).

We know that the white spaces left ( \s+) will not be inside the quotation marks (or the first rule would correspond to them instead) and nothing would replace these spaces.


Look in action

+4

, , : .

. . , , , . , regex .

, : , .

, , . , .

def kill_spaces(test_str):
    inside_quote = False
    result = ""
    for character in test_str:
        if character != " " or inside_quote:
            result += character
        if character == '"':
            inside_quote = not inside_quote
    return result

test = 'foo ("hi bye",       "bye     hi");'
kill_spaces(test)
>>> 'foo("hi bye","bye     hi");'
+1

Source: https://habr.com/ru/post/1663245/


All Articles