How to use regex module in python to split a line of text into words only?

Here is what I work with ...

string1 = "Dog,cat,mouse,bird. Human."

def string_count(text):
    text = re.split('\W+', text)
    count = 0
    for x in text:
        count += 1
        print count
        print x

return text

print string_count(string1)

... and here is the conclusion ...

1
Dog
2
cat
3
mouse
4
bird
5
Human
6

['Dog', 'cat', 'mouse', 'bird', 'Human', '']

Why am I getting 6, although there are only 5 words? I can not get rid of ''(empty string)! It drives me crazy.

+1
source share
2 answers

Because while it is breaking based on the last point, it also gives the last empty part.

\W+, , . , . - .

+1

, . :

string1 = "Dog,cat,mouse,bird. Human."
the_list = [word for word in re.split('\W+', string1) if word]
# include the word in the list if it not the empty string

( ...)

string1 = "Dog,cat,mouse,bird. Human."
the_list = re.findall('\w+', string1)
# find all words in string1
+1

Source: https://habr.com/ru/post/1668050/


All Articles