For everyone in the future, this will solve my problem:
corpus = [["this is spam, 'SPAM'"],["this is ham, 'HAM'"],["this is nothing, 'NOTHING'"]]
from sklearn.feature_extraction.text import CountVectorizer
bag_of_words = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False).fit_transform(splited_labels_from_corpus)
And this is the result when I use the function .toarray():
[[0 0 1]
[1 0 0]
[0 1 0]]
Thanks guys,
source
share