CountVectorizer and Out-Of-Vocabulary (OOV) Tokens?

Question

CountVectorizer and Out-Of-Vocabulary (OOV) Tokens?

Now I use CountVectorizerto extract functions. However, I need to count words that were not visible during installation.

During conversion, the default behavior CountVectorizeris to ignore words that were not followed during installation. But I need to remember how many times this happens!

How can i do this?

Thank!

+4

python scikit-learn

Jose G Oct 25 '16 at 3:25

source share

1 answer

vumaasha · Answer 1 · 2018-02-23T11:56:28+0000

There is no built-in training method in scikit-in-build, you need to write additional code to do this. However, you can use the attribute to achieve this vocabulary_ CountVectorizer.

Current vocabulary cache
Call fit_transform
diff

CountVectorizer and Out-Of-Vocabulary (OOV) Tokens?

More articles: