Random Forest Importance Chart Using Python

Question

Random Forest Importance Chart Using Python

I am working with RandomForestRegressor in python and I want to create a diagram that will illustrate the ranking of the importance of a function. This is the code I used:

from sklearn.ensemble import RandomForestRegressor MT= pd.read_csv("MT_reduced.csv") df = MT.reset_index(drop = False) columns2 = df.columns.tolist() # Filter the columns to remove ones we don't want. columns2 = [c for c in columns2 if c not in["Violent_crime_rate","Change_Property_crime_rate","State","Year"]] # Store the variable we'll be predicting on. target = "Property_crime_rate" # Let's randomly split our data with 80% as the train set and 20% as the test set: # Generate the training set. Set random_state to be able to replicate results. train2 = df.sample(frac=0.8, random_state=1) #exclude all obs with matching index test2 = df.loc[~df.index.isin(train2.index)] print(train2.shape) #need to have same number of features only difference should be obs print(test2.shape) # Initialize the model with some parameters. model = RandomForestRegressor(n_estimators=100, min_samples_leaf=8, random_state=1) #n_estimators= number of trees in forrest #min_samples_leaf= min number of samples at each leaf # Fit the model to the data. model.fit(train2[columns2], train2[target]) # Make predictions. predictions_rf = model.predict(test2[columns2]) # Compute the error. mean_squared_error(predictions_rf, test2[target])#650.4928

Function value

 features=df.columns[[3,4,6,8,9,10]] importances = model.feature_importances_ indices = np.argsort(importances) plt.figure(1) plt.title('Feature Importances') plt.barh(range(len(indices)), importances[indices], color='b', align='center') plt.yticks(range(len(indices)), features[indices]) plt.xlabel('Relative Importance')

This importance severity code was modified from an example found at http://www.agcross.com/2015/02/random-forests-in-python-with-scikit-learn/

When I try to replicate the code, my data gets the following error:

  IndexError: index 6 is out of bounds for axis 1 with size 6

In addition, only one function with 100% importance is displayed on my diagram, where there are no labels.

Any help in solving this problem, so I can create this diagram, would be greatly appreciated.

+5

python plot random-forest feature-selection

user348547 May 21, '17 at 20:26

source share

2 answers

In the above code from spies006, "feature_names" does not work for me. A common solution would be to use name_of_the_dataframe.columns.

0

Ananish Mar 11 '18 at 11:15

source share

spies006 · Accepted Answer · 2017-05-21T22:33:04+0000

Here is an example of using aperture dial.

 >>> from sklearn.datasets import load_iris >>> iris = load_iris() >>> rnd_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1, random_state=42) >>> rnd_clf.fit(iris["data"], iris["target"]) >>> for name, importance in zip(iris["feature_names"], rnd_clf.feature_importances_): ... print(name, "=", importance) sepal length (cm) = 0.112492250999 sepal width (cm) = 0.0231192882825 petal length (cm) = 0.441030464364 petal width (cm) = 0.423357996355

The value of the plot function

 >>> features = iris['feature_names'] >>> importances = rnd_clf.feature_importances_ >>> indices = np.argsort(importances) >>> plt.title('Feature Importances') >>> plt.barh(range(len(indices)), importances[indices], color='b', align='center') >>> plt.yticks(range(len(indices)), features[indices]) >>> plt.xlabel('Relative Importance') >>> plt.show()

Random Forest Importance Chart Using Python

Function value

More articles: