Python - how to add numpy array to pandas framework

Question

Python - how to add numpy array to pandas framework

I prepared a logistic regression classifier to predict whether the recall is positive or negative. Now I want to add the predicted probabilities returned by the predict_proba-function function to my Pandas data frame containing the reviews. I tried to do something like:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously this does not work as it predict_probareturns a 2D-numpy array. So what is the most efficient way to do this? I created test_matrixusing SciKit-Learn CountVectorizer:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data is as follows:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |

+4

python numpy pandas scikit-learn machine-learning

DBE7 Feb 18 '17 at 11:28

source share

1 answer

Karthik Arumugham · Accepted Answer · 2017-02-18T12:50:41+0000

, , pandas. x 2D numpy ,

x = sentiment_model.predict_proba(test_matrix)

,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]

Python - how to add numpy array to pandas framework

More articles: