Setting feature weights for KNN

I am working on implementing KNN in sklearn. Although my input data has about 20 functions, I believe that some of the functions are more important than others. Is there any way:

  • set the weights for each function when the student is โ€œtrainedโ€ by KNN.
  • find out what optimal weight values โ€‹โ€‹have or do not pre-process the data.

In the corresponding note, I understand that usually KNN does not require training, but since sklearn implements it using KDTrees, the tree must be created from the training data. However, this sounds like turning KNN into a binary tree problem. Is that the case?

Thanks.

+6
source share
2 answers

kNN is simply based on a remote function. When you say that โ€œfeature two is more important than others,โ€ this usually means that the difference in function 2 is, say, 10 times the difference in other combinations. An easy way to achieve this is to multiply coordinate # 2 by its weight. Thus, you put in the tree not the original coords, but coords multiplied by their respective weights.

In case your functions are combinations of coords, you may need to apply the appropriate matrix transformation on your coords before applying weights, see PCA (analysis of the main components). The PCA will probably help you with question 2.

+5
source

The answer to question k is called "metric learning" and is not currently implemented in Scikit-learn. Using the popular Mahalanobis distance means rescaling the data with StandardScaler. Ideally, you would like your metric to take labels into account.

0
source

Source: https://habr.com/ru/post/957782/


All Articles