How to save punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

Is there a way for me to save punctuation marks!,?, And from my text documents using the text CountVectorizer or TfidfVectorizer parameters in Scikit-Learn?

Thanks in advance.

+4
source share
1 answer

You must configure the parameter token_patternwhen instantiating the vector. For instance:

vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'")
+4
source

Source: https://habr.com/ru/post/1653187/


All Articles