I process a huge amount of text data in sklearn. First I need to vectorize the text context (number of words), and then execute TfidfTransformer. I have the following code that does not seem to output the output from the CountVectorizer to the input of TfidfTransformer.
TEXT = [data[i].values()[3] for i in range(len(data))]
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
vectorizer = CountVectorizer(min_df=0.01,max_df = 2.5, lowercase = False, stop_words = 'english')
X = vectorizer(TEXT)
transformer = TfidfTransformer(X)
X = transformer.fit_transform()
When I run this code, I get this error:
Traceback (most recent call last):
File "nlpQ2.py", line 27, in <module>
X = vectorizer(TEXT)
TypeError: 'CountVectorizer' object is not callable
I thought I vectorized the text, and now it is in the matrix - is there a transitional step that I skipped? Thank!
source
share