Single Perceptron - non-linear evaluation function

In the case of one perceptron, it is indicated in the literature that it cannot be used to separate non-linear discriminant cases, such as the XOR function. This is understandable because the VC line size (in 2-D) is 3, and therefore one two-dimensional line cannot recognize outputs such as XOR.

However, my question is why should the evaluation function in one perceptron be a linear step function? It is clear that if we have a nonlinear evaluation function similar to a sigmoid, this perceptron can distinguish between 1s and 0s XOR. So, am I missing something?

+6
source share
3 answers

if we have a non-linear evaluation function, such as a sigmoid, this perceptron can distinguish between 1s and 0s XOR

This is not entirely true. Discrimination criteria are not a form of a line (or a hyperplane in higher dimensions), but the possibility of whether the function allows linear separability .

There is no single function that creates a hyperplane capable of separating points of the XOR function. The curve in the image separates the points, but this is not a function.

You can't separate the blue and red points into separate hyperplanes using a single function

To separate the XOR points, you will need to use at least two lines (or any other shaped functions). This will require two separate perceptrons. You can then use the third perceptron to separate the intermediate results based on the sign.

If you use two lines, you can create two discriminants, and then merge the result

+11
source

I assume that a sigmoid does not really mean a sigmoid, but something with a local maximum. While the usual binary perceptron classifier has the form:

f(x) = (1 if w.x+b>0 else 0) 

you may have a function:

 f(x) = (1 if |w.x+b|<0.5 else 0) 

This will certainly work, but it will be quite artificial, since you are effectively adapting your model to your data set, which is bad.

If the normal perceptron algorithm converged, then almost certainly there could be no question, although I could be wrong. http://en.wikipedia.org/wiki/Perceptron#Separability_and_convergence You may have to come up with a whole new way to fit this function, which spoils the target.

Or you could just use a supportive vector engine that is like a perceptron, but capable of handling more complex cases using a kernel trick .

+2
source

An old question, but I want to leave my thoughts (someone will correct me if I am wrong).

I think you are mixing the concepts of linear model and loss or error functions. The perceptron is by definition a linear model, so it defines a line / plane / hyperplane that you can use to separate your classes.

The standard Perceptron algorithm extracts your output signal, giving -1 or 1:

 yhat = signal(w * X + w0) 

This is normal and will converge over time if your data is linearly separable .

To improve this, you can use sigmoid to smooth the loss function in the range [-1, 1]:

 yhat = -1 + 2*sigmoid(w * X + w0) mean_squared_error = (Y - yhat)^2 

Then use a numerical optimizer like Gradient Descent to minimize the error across your entire dataset. Here w0, w1, w2, ..., wn are your variables.

Now, if your source data is not linearly separable , you can convert it in such a way as to make it linearly separable, and then apply any linear model. This is true because the model is linear on the weights .

Basically, models like SVM are made under the hoods to classify your non-linear data.

PS: I'm learning this stuff too, so experts don't be mad at me if i said some crap.

0
source

Source: https://habr.com/ru/post/910165/


All Articles