There are several ways to solve this problem. I have one that works, but I think that it is suboptimal. Hopefully someone who knows regex better will come and improve the answer or suggest the best.
Your question is tagged python-3.x, but your code is python 2.x, so my code is 2.x. I include a version that works in 3.x.
#!/usr/bin/env python import re tweet = "I am tired! I like fruit...and milk" # print tweet clean_words = tweet.translate(None, ",.;@#?!&$") # Python 2 # clean_words = tweet.translate(",.;@#?!&$") # Python 3 print(clean_words) # Does not handle fruit...and regex_sub = re.sub(r"[,.;@#?!&$]+", ' ', tweet) # + means match one or more print(regex_sub) # extra space between tired and I regex_sub = re.sub(r"\s+", ' ', regex_sub) # Replaces any number of spaces with one space print(regex_sub) # looks good
Bryan source share