How to change element weight when training model with sklearn?

Question

How to change element weight when training model with sklearn?

I want to classify text with sklearn. At first I used a word bag for learning data, the feature of the word bag is really great, more than 10,000 functions, so I reduced this function using SVD to 100.

But here I want to add some other functions, such as # words, # positive words, # pronouns, etc. Additional functions are just 10 less functions that are compared with 100 bags of word function, / p>

From this situation, I raise 2 questions:

Is there any function in sklearn that can change the weight of additional functions to make them more important?
How to check an additional function for the classifier?

+4

python scikit-learn classification feature-selection

HAO CHEN Nov 28 '15 at 15:40

source share

1 answer

fernandosjp · Answer 1 · 2015-12-02T02:48:31+0000

Despite the great interest, I do not know the answer to the main question. Meanwhile, I can help with the second.

After setting the model, you can access the value of the function through the attribute model.feature_importances_

I use the following function to normalize the importance and show it in a more beautiful way.

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # (optional)

def showFeatureImportance(model):
    #FEATURE IMPORTANCE
    # Get Feature Importance from the classifier
    feature_importance = model.feature_importances_

    # Normalize The Features
    feature_importance = 100.0 * (feature_importance / Feature_importance.max())
    sorted_idx = np.argsort(feature_importance)
    pos = np.arange(sorted_idx.shape[0]) + .5

    #plot relative feature importance
    plt.figure(figsize=(12, 12))
    plt.barh(pos, feature_importance[sorted_idx], align='center', color='#7A68A6')
    plt.yticks(pos, np.asanyarray(X_cols)[sorted_idx])
    plt.xlabel('Relative Importance')
    plt.title('Feature Importance')
    plt.show()

How to change element weight when training model with sklearn?

More articles: