Python - How to clear spaces from text

In Python, I have many lines containing spaces. I would like to clear all spaces from the text, except when they are in quotation marks.

Input Example:

This is "an example text" containing spaces. 

And I want to get:

 Thisis"an example text"containingspaces. 

line.split() does not seem to be very good, because it clears all spaces from the text.

What do you recommend?

+4
source share
7 answers

For the simple case that only " used as quotation marks:

 >>> import re >>> s = 'This is "an example text" containing spaces.' >>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s) 'Thisis"an example text"containingspaces.' 

Explanation:

 [ ] # Match a space (?= # only if an even number of spaces follows --> lookahead (?: # This is true when the following can be matched: [^"]*" # Any number of non-quote characters, then a quote, then [^"]*" # the same thing again to get an even number of quotes. )* # Repeat zero or more times. [^"]* # Match any remaining non-quote characters $ # and then the end of the string. ) # End of lookahead. 
+5
source

This may be a more elegant solution, but:

 >>> test = "This is \"an example text\" containing spaces." >>> '"'.join([x if i % 2 else "".join(x.split()) for i, x in enumerate(test.split('"'))]) 'Thisis"an example text"containingspaces.' 

We divide the text into quotation marks, and then move on to them through a list comprehension . We remove spaces by breaking and repeating if the index is odd (not inside the quotes), and should not be even (inside the quotes). Then we return to all quotation marks.

+4
source

Using re.findall is probably a more understandable / flexible method:

 >>> s = 'This is "an example text" containing spaces.' >>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s)) 'Thisis"an example text"containingspaces.' 

You can (ab) use csv.reader :

 >>> import csv >>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' '))) 'Thisis"an example text"containingspaces.' 

Or using re.split :

 >>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s))) 'Thisis"an example text"containingspaces.' 
+4
source

Use regular expressions!

 import cStringIO, re result = cStringIO.StringIO() regex = re.compile('("[^"]*")') text = 'This is "an example text" containing spaces.' for part in regex.split(text): if part and part[0] == '"': result.write(part) else: result.write(part.replace(" ", "")) return result.getvalue() 
+1
source

You can also do this with csv:

 import csv out=[] for e in csv.reader('This is "an example text" containing spaces. '): e=''.join(e) if e==' ': continue if ' ' in e: out.extend('"'+e+'"') else: out.extend(e) print ''.join(out) 

Print Thisis"an example text"containingspaces.

+1
source
 '"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"'))) 
0
source
 quotation_mark = '"' space = " " example = 'foo choo boo "blaee blahhh" didneid ei did ' formated_example = '' if example[0] == quotation_mark: inside_quotes = True else: inside_quotes = False for character in example: if inside_quotes != True: formated_example += character else: if character != space: formated_example += character if character == quotation_mark: if inside_quotes == True: inside_quotes = False else: inside_quotes = True print formated_example 
0
source

Source: https://habr.com/ru/post/1482989/


All Articles