We have a file called wordlist that contains 1,876 KB of alphabetic words, all of which are longer than 4 letters and contain one carriage return between each new two-letter construct (ab, ac, ad, etc., words all contain returns between them) :
wfile = open("wordlist.txt", "r+")
I want to create a new file containing only words that are not derived from other smaller words. For example, a word list contains the following words: "Violators, abuse, abuse, abuse, abuse, etc.". The created new file should contain only the word "abuse" because it is the "lowest common denominator" (if you will) between all these words. Similarly, the word rodeo will be deleted because it contains the word rodeo.
I tried this implementation:
def root_words(wordlist): result = [] base = wordlist[1] for word in wordlist: if not word.startswith(base): result.append(base) print base base=word result.append(base) return result; def main(): wordlist = [] wfile = open("wordlist.txt", "r+") for line in wfile: wordlist.append(line[:-1]) wordlist = root_words(wordlist) newfile = open("newwordlist.txt", "r+") newfile.write(wordlist)
But he always froze on my computer. Any solutions?
source share