Since there are very few words in the dictionary, and the words themselves are quite small, I’ll just try to look for all possible substrings of each word in the dictionary. Of course, it makes no sense to look for substrings of size 0 or 1, you probably want to have a lower threshold for the size of the word.
For each substring, you can simply search for it in the sentence, and if that happens, you can mark it as a possible part of the sentence. For speed, you can do a search in a sentence in O (n) (for example, using KMP or Rabin Karp )
Here is a simple hacking idea in Python (using brute force string matching):
d=["constitution","abracadabra","refrigerator","stackoverflow"] def substring_match(word,sentence,min_length): for start in xrange(0,len(word)): for end in xrange(start+min_length,len(word)): substr=word[start:end+1] if substr in sentence: return True return False def look_for_words(word_dict,sent_word): return [word for word in word_dict if substring_match(word,sent_word,5)] def look(word_dict,sentence): ret=[] for word in sentence.split(): ret.extend(look_for_words(word_dict,word)) return ret if __name__=='__main__': print "\n".join(look(d,"xbracadabrqbonstitution ibracadabrefrigeratos obracadabri xtackoverflotefrigeratos"))
source share