Use the collections.Counter()
object and separate the words into spaces. You might also want to smooth your words and remove punctuation marks:
from collections import Counter counts = Counter() for sentence in sequence_of_sentences: counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())
or perhaps use a regular expression that matches only word characters:
from collections import Counter import re counts = Counter() words = re.compile(r'\w+') for sentence in sequence_of_sentences: counts.update(words.findall(sentence.lower()))
You now have a dictionary of counts
with word counting.
Demo:
>>> sequence_of_sentences = ['This is a sentence', 'This is another sentence'] >>> from collections import Counter >>> counts = Counter() >>> for sentence in sequence_of_sentences: ... counts.update(word.strip('.,?!"\'').lower() for word in sentence.split()) ... >>> counts Counter({'this': 2, 'is': 2, 'sentence': 2, 'a': 1, 'another': 1}) >>> counts['sentence'] 2
source share