Python - How to clear spaces from text

Question

Python - How to clear spaces from text

In Python, I have many lines containing spaces. I would like to clear all spaces from the text, except when they are in quotation marks.

Input Example:

This is "an example text" containing spaces.

And I want to get:

 Thisis"an example text"containingspaces.

line.split() does not seem to be very good, because it clears all spaces from the text.

What do you recommend?

+4

python string

Tűzálló Földgolyó May 27, '13 at 15:23

source share

7 answers

Tim pietzcker · Answer 1 · 2013-05-27T15:30:43+0000

For the simple case that only " used as quotation marks:

 >>> import re >>> s = 'This is "an example text" containing spaces.' >>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s) 'Thisis"an example text"containingspaces.'

Explanation:

 [ ] # Match a space (?= # only if an even number of spaces follows --> lookahead (?: # This is true when the following can be matched: [^"]*" # Any number of non-quote characters, then a quote, then [^"]*" # the same thing again to get an even number of quotes. )* # Repeat zero or more times. [^"]* # Match any remaining non-quote characters $ # and then the end of the string. ) # End of lookahead.

Gareth latty · Answer 2 · 2013-05-27T15:30:49+0000

This may be a more elegant solution, but:

 >>> test = "This is \"an example text\" containing spaces." >>> '"'.join([x if i % 2 else "".join(x.split()) for i, x in enumerate(test.split('"'))]) 'Thisis"an example text"containingspaces.'

We divide the text into quotation marks, and then move on to them through a list comprehension . We remove spaces by breaking and repeating if the index is odd (not inside the quotes), and should not be even (inside the quotes). Then we return to all quotation marks.

Jon clements · Answer 3 · 2013-05-27T15:57:12+0000

Using re.findall is probably a more understandable / flexible method:

 >>> s = 'This is "an example text" containing spaces.' >>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s)) 'Thisis"an example text"containingspaces.'

You can (ab) use csv.reader :

 >>> import csv >>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' '))) 'Thisis"an example text"containingspaces.'

Or using re.split :

 >>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s))) 'Thisis"an example text"containingspaces.'

werehuman · Answer 4 · 2013-05-27T15:32:17+0000

Use regular expressions!

 import cStringIO, re result = cStringIO.StringIO() regex = re.compile('("[^"]*")') text = 'This is "an example text" containing spaces.' for part in regex.split(text): if part and part[0] == '"': result.write(part) else: result.write(part.replace(" ", "")) return result.getvalue()

dawg · Answer 5 · 2013-05-27T15:59:50+0000

You can also do this with csv:

 import csv out=[] for e in csv.reader('This is "an example text" containing spaces. '): e=''.join(e) if e==' ': continue if ' ' in e: out.extend('"'+e+'"') else: out.extend(e) print ''.join(out)

Print Thisis"an example text"containingspaces.

Elazar · Answer 6 · 2013-05-27T15:42:59+0000

 '"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))

eblahm · Answer 7 · 2013-05-27T15:51:58+0000

 quotation_mark = '"' space = " " example = 'foo choo boo "blaee blahhh" didneid ei did ' formated_example = '' if example[0] == quotation_mark: inside_quotes = True else: inside_quotes = False for character in example: if inside_quotes != True: formated_example += character else: if character != space: formated_example += character if character == quotation_mark: if inside_quotes == True: inside_quotes = False else: inside_quotes = True print formated_example

Python - How to clear spaces from text

More articles: