Why does decreasing K in K-nearest neighbors increase complexity?

An extract from my textbook says that decreasing the value of K when running this algorithm actually increases complexity, since it should run more “smoothing”.

Can someone explain this to me?

I understand that in 1NN you feed him your set of workouts. You are testing your test suite. Suppose you have one point in your test case. He finds the point closest to him in the training set and returns the value of this.

Of course, this is less complicated than finding the 3 nearest points in 3NN , adding their values ​​and dividing them into three?

What did I misunderstand or forget?

+6
source share
2 answers

I had the same moment of distrust when I read this axiom; a higher value parameter that reduces complexity seems a little intuitive at first.

To impose intuition on this, compare the model with the 1-nearest neighbor and N → 1-nearest neighbors. Let us use a simplified 2D-graph (two-digit data set) with binary classification (each “point” has a class or label either A or B).

In a 1-neighbor model, each training set example is potentially the center of a region predicting class A or B, with most of its neighbors being the center of a region predicting another class. Your plot may look like one of those cards of ethnicity, language or religion in the regions of the world where they are deeply intertwined (the Balkans or the Middle East come to mind): small spots of complex shapes and alternating colors, without visible logic, and thus "high complexity".

1-nearest neighbor

If you increase k, the areas predicting each class will be more “smoothed out,” since these are the majority of k-nearest neighbors that define the class of any point. Thus, the areas will have a smaller number, larger sizes and, probably, simpler forms, such as political maps of the country's borders in the same areas of the world. Thus, "less complexity."

k-nearest neighbors

(Intuition and source from this course .)

+3
source

Q: is k-NN faster than NN ?

A: no

See below for more details.

In general, NN search is simpler, therefore, less effort is required than k-NN , when, of course, k is not equal to 1.

Take a look at my answer here , where I mainly explain the concept of NNS (* Nearest Neighbor Search).

In the case of kNN general algorithm can, for example, find the top NN , then the second top NN , etc., until k NN is found.

Another likely approach would be to have a priority_queue that contains k in the number NN , and they are ordered by their distance from the request point.

In order for the general algorithm to find more than one neighbor, it must have access to more nodes / sheets, which means a larger step, which increases the time complexity.

It is clear that accuracy can increase with increasing k, but the cost of computing also increases.

as said on this blog .

I suspect that you are talking about a specific algorithm in your question, but not knowing what, in my opinion, there can be no better answer.

0
source

Source: https://habr.com/ru/post/969600/


All Articles