How to use the Mann-Whitney U test in training

I have a table (X, Y), where X is the matrix and Y is the class vector. Here is an example:

X = 0 0 1 0 1 and Y = 1 0 1 0 0 0 1 1 1 1 0 1 0 

I want to use the Mann-Whitney U test to calculate the importance of a function (function selection)

 from scipy.stats import mannwhitneyu results = np.zeros((X.shape[1],2)) for i in xrange(X.shape[1]): u, prob = mannwhitneyu(X[:,i], Y) results[i,:] = u, pro 

I'm not sure if this is correct or not? I got large values ​​for a large table, u = 990 for some columns.

+4
source share
1 answer

I don’t think that using the Mann-Whitney U test is a good way to make a feature selection. Mann-Whitney checks to see if the distributions of the two variables are the same; he says nothing about how the variables correlate. For instance:

 >>> from scipy.stats import mannwhitneyu >>> a = np.arange(100) >>> b = np.arange(100) >>> np.random.shuffle(b) >>> np.corrcoef(a,b) array([[ 1. , -0.07155116], [-0.07155116, 1. ]]) >>> mannwhitneyu(a, b) (5000.0, 0.49951259627554112) # result for almost not correlated >>> mannwhitneyu(a, a) (5000.0, 0.49951259627554112) # result for perfectly correlated 

Since a and b have the same distributions, we cannot reject the null hypothesis that the distributions are identical.

And since you are trying to find functions that mainly explain Y in the choice of functions, Mann-Whitney U will not help you with this.

+8
source

Source: https://habr.com/ru/post/1483981/


All Articles