Re.split with spaces in python

I have a line of text that looks like this:

' 19,301 14,856 18,554' 

Where is the space.

I am trying to break it into a space, but I need to save all the empty space as an item in a new list. Like this:

 [' ', '19,301',' ', '14,856', ' ', '18,554'] 

I used the following code:

 re.split(r'( +)(?=[0-9])', item) 

and it returns:

 ['', ' ', '19,301', ' ', '14,856', ' ', '18,554'] 

Note that it always adds an empty element to the top of my list . It's easy enough, but I really want to understand what is happening here, so I can get the code to constantly process things. Thanks.

+5
source share
2 answers

When using the re.split method, if the capture group is matched at the beginning of the line, the result starts with an empty line . "The reason for this is that the join method can behave like the opposite of the split method.

This may not make much sense for your case, when the separator matches are of different sizes, but if you think that the separators were a symbol | , and you wanted to combine them, with an additional empty line, it will work:

 >> item = '|19,301|14,856|18,554' >> items = re.split(r'\|', item) >> print items ['', '19,301', '14,856', '18,554'] >> '|'.join(items) '|19,301|14,856|18,554' 

But without it, the original pipe will be absent:

 >> items = ['19,301', '14,856', '18,554'] >> '|'.join(items) '19,301|14,856|18,554' 
+4
source

You can do this with re.findall() :

 >>> s = '\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s19,301\s\s\s\s\s\s\s\s\s14,856\s\s\s\s\s\s\s\s18,554'.replace('\\s',' ') >>> re.findall(r' +|[^ ]+', s) [' ', '19,301', ' ', '14,856', ' ', '18,554'] 

You said "space" in the question, so the template works with space. To match spaces of any space character you can use:

 >>> re.findall(r'\s+|\S+', s) [' ', '19,301', ' ', '14,856', ' ', '18,554'] 

A pattern matches one or more space characters or one or more characters without spaces, for example:

 >>> s=' \t\t ab\ncd\tef g ' >>> re.findall(r'\s+|\S+', s) [' \t\t ', 'ab', '\n', 'cd', '\t', 'ef', ' ', 'g', ' '] 
+3
source

Source: https://habr.com/ru/post/1241453/


All Articles