Paste sklearn CountVectorizer result into pandas framework

Question

Paste sklearn CountVectorizer result into pandas framework

I have a bunch of 14784 text documents that I'm trying to highlight, so I can do some analysis. I used CountVectorizerin sklearn to convert documents to object vectors. I did this by calling:

vectorizer = CountVectorizer
features = vectorizer.fit_transform(examples)

where examples is an array of all text documents

Now I am trying to use additional features. To do this, I save the functions in the pandas frame. Currently my pandas framework (without inserting text functions) has a form (14784, 5). The shape of my vector function (14784, 21343).

What would be a good way to insert vectorized functions into the pandas framework?

+4

python pandas scikit-learn machine-learning

Saurabh food Nov 02 '16 at 0:46

source share

2 answers

Nickil maveli · Answer 1 · 2016-11-02T09:40:35+0000

The returned term is a document-matrix after studying a dictionary of unprocessed documents.

X = vect.fit_transform(docs)

Convert a sparse csr matrix to a dense format and allow columns to contain a mapping of arrays from integer function indices to object names.

count_vect_df = pd.DataFrame(X.todense(), columns=vect.get_feature_names())

Connect the source dfand count_vect_dfcolumns.

pd.concat([df, count_vect_df], axis=1)

Tchotchke · Answer 2 · 2016-11-02T01:39:26+0000

If your base data frame df, all you have to do is:

import pandas as pd    
features_df = pd.DataFrame(features)
combined_df = pd.concat([df, features_df], axis=1)

, , . , , - . , max_features, features = vectorizer.fit_transform(examples, max_features = 1000), .

Paste sklearn CountVectorizer result into pandas framework

More articles: