Python separates text with quotation marks and spaces

I have the following text

text = 'This is "a simple" test' 

And I need to break it in two ways: first with quotation marks, and then with spaces, resulting in:

 res = ['This', 'is', '"a simple"', 'test'] 

But with str.split() I can only use quotation marks or spaces as delimiters. Is there a built-in function for multiple delimiters?

+5
source share
6 answers

You can use shlex.split , convenient for parsing quoted strings:

 >>> import shlex >>> text = 'This is "a simple" test' >>> shlex.split(text, posix=False) ['This', 'is', '"a simple"', 'test'] 

Doing this in non-posix mode prevents the removal of internal quotes from the split result. posix is set to True by default:

 >>> shlex.split(text) ['This', 'is', 'a simple', 'test'] 

If you have several lines of this type of text or you are reading from a stream, you can effectively split (excluding quotes in the output) using csv.reader :

 import io import csv s = io.StringIO(text.decode('utf8')) # in-memory streaming f = csv.reader(s, delimiter=' ', quotechar='"') print list(f) # [['This', 'is', 'a simple', 'test']] 

Unless in Python 3 you need to decode a string in unicode, since all the strings are already unicode.

+8
source

If you understood correctly, then you can use regex

 >>> import re >>> text = 'This is "a simple" test' 

>>> re.split('\s|\"', text)

['This', 'is', '', 'a', 'simple', '', 'test']

+2
source

For your case, shlex.split will be just fine.

How to answer multiple delimiters?

 import re re.split('\"|\s', string) 
+1
source

using csv reader.

 import csv text = 'This is "a simple" test' list_text=[] list_text.append(text) for row in csv.reader(list_text, delimiter=" "): print(row) 

You can also learn more about here.

0
source

try using re:

 import re text = 'This is "a simple" test' print(re.split('\"|\s', text)) 

Result:

 ['This', 'is', '', 'a', 'simple', '', 'test'] 
0
source

You can look into the shlex library.

 from shlex import split a = 'This is "a simple" text' split(a) 

['This', 'is', 'simple', 'text']

I don't think regex is what you are looking for

0
source

Source: https://habr.com/ru/post/1269713/


All Articles