I donβt think that using the Mann-Whitney U test is a good way to make a feature selection. Mann-Whitney checks to see if the distributions of the two variables are the same; he says nothing about how the variables correlate. For instance:
>>> from scipy.stats import mannwhitneyu >>> a = np.arange(100) >>> b = np.arange(100) >>> np.random.shuffle(b) >>> np.corrcoef(a,b) array([[ 1. , -0.07155116], [-0.07155116, 1. ]]) >>> mannwhitneyu(a, b) (5000.0, 0.49951259627554112)
Since a and b have the same distributions, we cannot reject the null hypothesis that the distributions are identical.
And since you are trying to find functions that mainly explain Y in the choice of functions, Mann-Whitney U will not help you with this.
source share