What is the difference between fit_transform and conversion to sklearn countvectorizer?

I just started exploring a random forest, so if that sounds stupid, I really regret it

I recently practiced the description of the word: kaggle , I want to clarify a few things:

using vectorizer.fit_transform ("* in the list of cleaned reviews *))

Now, when we were preparing an array of word sums in train reviews, we used fit_predict in the train review list, now I know that fit_predict does two things:> first it is suitable for data and knows the dictionary , and then it makes vectors for each review.

so when we used vectorizer.transform (the "list of cleared train reviews") , it simply converts the list of test reviews into a vector for each review.

My question is: why not use fit_transform in the test list too !! I mean, in the documents, he says that this leads to retraining, but wait, it makes sense for me to use it anyway, let me give you my perspectives:

when we do not use fit_transform, we are essentially talking to make a vector of test review functions using the most common train recall words !! Why not make an array of test functions using the most common words in a test?

, ? , , test .

: , , , , , , .

+4
1

fit_transform , , , Random Forest . , , . , , , , .

, CountVectorizer, : 3 :

  • .
  • .
  • .

{Dog, is, black, sky, blue, dancing}. , , , 6 . , 6. , :

  • .
  • .

, fit_transform, {Dog, white, is, Sky, black}. , 5, . . , . fit .

, !

+5

Source: https://habr.com/ru/post/1649747/


All Articles