An old question, but I want to leave my thoughts (someone will correct me if I am wrong).
I think you are mixing the concepts of linear model
and loss
or error
functions. The perceptron is by definition a linear model, so it defines a line / plane / hyperplane that you can use to separate your classes.
The standard Perceptron algorithm extracts your output signal, giving -1 or 1:
yhat = signal(w * X + w0)
This is normal and will converge over time if your data is linearly separable
.
To improve this, you can use sigmoid
to smooth the loss function in the range [-1, 1]:
yhat = -1 + 2*sigmoid(w * X + w0) mean_squared_error = (Y - yhat)^2
Then use a numerical optimizer like Gradient Descent to minimize the error across your entire dataset. Here w0, w1, w2, ..., wn are your variables.
Now, if your source data is not linearly separable
, you can convert it in such a way as to make it linearly separable, and then apply any linear model. This is true because the model is linear on the weights
.
Basically, models like SVM are made under the hoods to classify your non-linear data.
PS: I'm learning this stuff too, so experts don't be mad at me if i said some crap.
source share