Paste sklearn CountVectorizer result into pandas framework

I have a bunch of 14784 text documents that I'm trying to highlight, so I can do some analysis. I used CountVectorizerin sklearn to convert documents to object vectors. I did this by calling:

vectorizer = CountVectorizer
features = vectorizer.fit_transform(examples)

where examples is an array of all text documents

Now I am trying to use additional features. To do this, I save the functions in the pandas frame. Currently my pandas framework (without inserting text functions) has a form (14784, 5). The shape of my vector function (14784, 21343).

What would be a good way to insert vectorized functions into the pandas framework?

+4
source share
2 answers

The returned term is a document-matrix after studying a dictionary of unprocessed documents.

X = vect.fit_transform(docs) 

Convert a sparse csr matrix to a dense format and allow columns to contain a mapping of arrays from integer function indices to object names.

count_vect_df = pd.DataFrame(X.todense(), columns=vect.get_feature_names())

Connect the source dfand count_vect_dfcolumns.

pd.concat([df, count_vect_df], axis=1)
+8
source

If your base data frame df, all you have to do is:

import pandas as pd    
features_df = pd.DataFrame(features)
combined_df = pd.concat([df, features_df], axis=1)

, , . , , - . , max_features, features = vectorizer.fit_transform(examples, max_features = 1000), .

-1

Source: https://habr.com/ru/post/1659567/


All Articles