I am new to Python, trying to understand the answer here to the question of counting unique words in a document. Answer:
print len(set(w.lower() for w in open('filename.dat').read().split()))
Reads the entire file in memory, breaks it into words using whitespace, converts each word to lower case, creates a (unique) set of lowercase words, counts them and prints the output
To understand this, I am trying to implement it in Python step by step. I can import a text plate using open and read, divide it into separate words using split, and make them all lowercase using lowercase. I can also create a set of unique words in the list. However, I canβt understand how to do the last part - to count the number of unique words.
I thought I could finish by repeating the elements in the set of unique words and counting them in the original lower case list, but I find that the constructed construct is not indexable.
So, I think I'm trying to do something that in natural language, like all the elements in a set, tell me how many times they occur in lowercase. But I canβt figure out how to do this, and I suspect that some kind of misunderstanding of Python is holding me back.
The guys are grateful for the answers. I just realized that I did not explain myself correctly - I wanted to find not only the total number of unique words (which, as I understand it, the length of the set), but also the number of times each individual word was used, for example. "the" was used 14 times, "and" was used 9 times, "it" was used 20 times and so on. Apologies for the confusion.