Feature Selection Using MRMR

I found two ways to implement MRMR to select functions in python. The source of the article containing this method is:

https://www.dropbox.com/s/tr7wjpc2ik5xpxs/doc.pdf?dl=0

This is my dataset code.

import numpy as np import pandas as pd from sklearn.datasets import make_classification from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" X, y = make_classification(n_samples=10000, n_features=6, n_informative=3, n_classes=2, random_state=0, shuffle=False) # Creating a dataFrame df = pd.DataFrame({'Feature 1':X[:,0], 'Feature 2':X[:,1], 'Feature 3':X[:,2], 'Feature 4':X[:,3], 'Feature 5':X[:,4], 'Feature 6':X[:,5], 'Class':y}) y_train = df['Class'] X_train = df.drop('Class', axis=1) 

Method 1: Use MRMR using pymrmr

Contains MID and MIQ

which is published by the author Link https://github.com/fbrundu/pymrmr

 import pymrmr pymrmr.mRMR(df, 'MIQ',6) 

['Feature 4', 'Feature 5', 'Feature 2', 'Feature 6', 'Feature 1', 'Feature 3']

or performed using the second method

 pymrmr.mRMR(df, 'MID',6) 

['Feature 4', 'Feature 6', 'Feature 5', 'Feature 2', 'Feature 1', 'Feature 3']

Both of these methods, on the dataset above, give this 2 output. Another GitHub author claims that you can use its version to apply the MRMR method. However, when I use it for the same dataset, I have a different result.

Method 2. Application of MRMR using MIFS

github link

https://github.com/danielhomola/mifs

 import mifs for i in range(1,11): feat_selector = mifs.MutualInformationFeatureSelector('MRMR',k=i) feat_selector.fit(X_train, y_train) # call transform() on X to filter it down to selected features X_filtered = feat_selector.transform(X_train.values) #Create list of features feature_name = X_train.columns[feat_selector.ranking_] print(feature_name) 

And if you run the above iteration for all different values โ€‹โ€‹of i, there will be no time when both methods actually give the same output of the function selection.

What could be the problem here?

+5
source share
1 answer

You may need to contact the authors of the original article and / or the owner of the Github repository for a definitive answer, but most likely the differences here are due to the fact that you are comparing 3 different algorithms (despite the name).

Minimum redundancy. Algorithms of maximum relevance are actually a family of function selection algorithms, the main purpose of which is to select functions that are at a distance from each other, which still have a "high" correlation with the classification variable.

You can measure this goal using measures of mutual information, but follow a specific method (for example, what to do with the estimates calculated? In what order? What other post-processing methods will be used? ...) differ from one author to another - even in the document, they actually give you two different implementations: MIQ and MID .

Thus, my suggestion would be to simply choose an implementation that is more convenient for you (or even better, one that gives the best results in your pipeline after a valid check) and just let you know which specific source you selected and why,

+1
source

Source: https://habr.com/ru/post/1275842/


All Articles