Setting feature weights for KNN

Question

Setting feature weights for KNN

I am working on implementing KNN in sklearn. Although my input data has about 20 functions, I believe that some of the functions are more important than others. Is there any way:

set the weights for each function when the student is “trained” by KNN.
find out what optimal weight values have or do not pre-process the data.

In the corresponding note, I understand that usually KNN does not require training, but since sklearn implements it using KDTrees, the tree must be created from the training data. However, this sounds like turning KNN into a binary tree problem. Is that the case?

Thanks.

+6

scikit-learn knn

user2976570 Nov 10 '13 at 17:07

source share

2 answers

The answer to question k is called "metric learning" and is not currently implemented in Scikit-learn. Using the popular Mahalanobis distance means rescaling the data with StandardScaler. Ideally, you would like your metric to take labels into account.

0

Andreas Mueller Nov 10 '13 at 23:18

source share

Michael simbirsky · Accepted Answer · 2013-11-10T18:16:32+0000

kNN is simply based on a remote function. When you say that “feature two is more important than others,” this usually means that the difference in function 2 is, say, 10 times the difference in other combinations. An easy way to achieve this is to multiply coordinate # 2 by its weight. Thus, you put in the tree not the original coords, but coords multiplied by their respective weights.

In case your functions are combinations of coords, you may need to apply the appropriate matrix transformation on your coords before applying weights, see PCA (analysis of the main components). The PCA will probably help you with question 2.

Setting feature weights for KNN

More articles: