Intuition about the core stunt in machine learning

I have successfully implemented the core perceptron classifier, which uses the RBF core. I understand that the kernel trick maps functions to a higher dimension, so you can build a linear hyperplane to separate points. For example, if you have a function (x1, x2) and contrast them with the 3-dimensional space objects, you can get: K(x1,x2) = (x1^2, sqrt(x1)*x2, x2^2).

If you connect this to the perceptron decision function w'x+b = 0, you will get:, w1'x1^2 + w2'sqrt(x1)*x2 + w3'x2^2which will give you the circular boundary of the decision.

While the kernel trick itself is very intuitive, I cannot understand the aspect of linear algebra. Can someone help me understand how we can map all these additional features without explicitly specifying them using only the internal product?

Thanks!

+4
source share
3 answers

Simple

Give me a numerical result (x + y) ^ 10 for some x and y values.

What would you rather do, “trick” and summarize x + y, and then take this value to the 10th power or expand the exact results by writing

x^10+10 x^9 y+45 x^8 y^2+120 x^7 y^3+210 x^6 y^4+252 x^5 y^5+210 x^4 y^6+120 x^3 y^7+45 x^2 y^8+10 x y^9+y^10

And then figure out each term, and then add them together? It is clear that we can evaluate a point product between polynomials of degree 10 without their explicit formation.

- , "" . , /.

+5

, , , "" , . , . , , . , phi, . , , , .

, , . , . , ​​ K (x, y) = < phi (x), phi (y) > <,. > . , , , phi(). , , , K , .

, , , , , , , , .

+2

.

The weight in the upper space is w = sum_i{a_i^t * Phi(x_i)}

and input vector in a higher space Phi(x)

so linear classification in higher space

w^t * input + c > 0

so if you put them together

sum_i{a_i * Phi(x_i)} * Phi(x) + c = sum_i{a_i * Phi(x_i)^t * Phi(x)} + c > 0

The computational complexity of the last point product is linear with respect to the number of measurements (often intractable or not needed)

We solve this by moving to the core of the "magic answer to a point product"

K(x_i, x) = Phi(x_i)^t * Phi(x)

which gives

sum_i{a_i * K(x_i, x)} + c > 0

+2
source

Source: https://habr.com/ru/post/1524796/


All Articles