Migrating from CountVectorizer to TFidfTransformer in sklearn

I process a huge amount of text data in sklearn. First I need to vectorize the text context (number of words), and then execute TfidfTransformer. I have the following code that does not seem to output the output from the CountVectorizer to the input of TfidfTransformer.

TEXT = [data[i].values()[3] for i in range(len(data))]

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

vectorizer = CountVectorizer(min_df=0.01,max_df = 2.5, lowercase = False, stop_words = 'english')

X = vectorizer(TEXT)
transformer = TfidfTransformer(X)
X = transformer.fit_transform()

When I run this code, I get this error:

Traceback (most recent call last):
File "nlpQ2.py", line 27, in <module>
X = vectorizer(TEXT)
TypeError: 'CountVectorizer' object is not callable

I thought I vectorized the text, and now it is in the matrix - is there a transitional step that I skipped? Thank!

+4
source share
2 answers

This line

X = vectorizer(TEXT)

( , , TfIdf), fit_transform. , . fit_transform, .

X = vectorizer.fit_transform(TEXT)
transformer = TfidfTransformer()
X = transformer.fit_transform(X)
+6

, pipeline, , - :

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
])

pipeline = make_pipeline(CountVectorizer(), TfidfTransformer())

(, fit, fit_transform ..).

. .

+2

Source: https://habr.com/ru/post/1649643/


All Articles