I prepared a logistic regression classifier to predict whether the recall is positive or negative. Now I want to add the predicted probabilities returned by the predict_proba-function function to my Pandas data frame containing the reviews. I tried to do something like:
test_data['prediction'] = sentiment_model.predict_proba(test_matrix)
Obviously this does not work as it predict_probareturns a 2D-numpy array. So what is the most efficient way to do this? I created test_matrixusing SciKit-Learn CountVectorizer:
vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))
Sample data is as follows:
| Review | Prediction |
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"| 0.986 |
source
share