Scikit-learn: The Role of Weights in Ridge Regression

Question

Scikit-learn: The Role of Weights in Ridge Regression

I use the scikit-learn library to perform Ridge regression with weights on individual samples. This can be done: esimator.fit(X, y, sample_weight=some_array) . Intuitively, I expect that larger weights mean more importance for the corresponding sample.

However, I tested the method above in the following two-dimensional example:

  from sklearn import linear_model import numpy import matplotlib.pyplot as plt #Data x= numpy.array([[0], [1],[2]]) y= numpy.array([[0], [2],[2]]) sample_weight = numpy.array([1,1, 1]) #Ridge regression clf = linear_model.Ridge(alpha = 0.1) clf.fit(x, y, sample_weight = sample_weight) #Plot xp = numpy.linspace(-1,3) yp=list() for x_i in xp: yp.append(clf.predict(x_i)[0,0]) plt.plot(xp,yp) plt.hold(True) x = list(x) y = list(y) plt.plot(x,y,'or')

I run this code, and I run it again, doubling the weight of the first sample:

 sample_weight = numpy.array([2,1, 1])

The resulting lines are removed from the sample with a large weight. This is contrary to intuition, since I expect that a sample with a large weight is of great importance.

Am I using the library incorrectly, or is it a mistake there?

+4

python scikit-learn machine-learning regression scikits

Marco Jul 12 '13 at 7:40

source share

1 answer

David dale · Answer 1 · 2017-11-23T07:25:00+0000

Weight is not inverted. You probably made a stupid mistake, or there was a mistake in sklearn , which is now fixed. Code

 from sklearn import linear_model import numpy import matplotlib.pyplot as plt #Data x = numpy.array([[0], [1],[2]]) y = numpy.array([[0], [2],[2]]) sample_weight1 = numpy.array([1, 1, 1]) sample_weight2 = numpy.array([2, 1, 1]) #Ridge regressions clf1 = linear_model.Ridge(alpha = 0.1).fit(x, y, sample_weight = sample_weight1) clf2 = linear_model.Ridge(alpha = 0.1).fit(x, y, sample_weight = sample_weight2) #Plot plt.scatter(x,y) xp = numpy.linspace(-1,3) plt.plot(xp,clf1.predict(xp.reshape(-1, 1))) plt.plot(xp,clf2.predict(xp.reshape(-1, 1))) plt.legend(['equal weights', 'first obs weights more']) plt.title('Increasing weight of the first obs moves the line closer to it');

this graph shows me where the second line (with the increased first weight) is closer to the first observation:

Scikit-learn: The Role of Weights in Ridge Regression

More articles: