Knowledge transfer in regularized linear regression

Question

Knowledge transfer in regularized linear regression

By default, all regularized linear regression scikit-learn methods pull model coefficients wto 0 with increasing alpha. Is it possible to stretch the coefficients instead of some predefined values instead? In my application, I have values that were obtained from a previous analysis of a similar, but much larger data set. In other words, can I transfer knowledge from one model to another?

The documentation LassoCVreads:

Optimization goal for Lasso:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

In theory, it is easy to incorporate previously obtained coefficients w0by changing the above to

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w - w0||_1

The problem is that the actual optimization is performed by the Cython function enet_coordinate_descent(called through lasso_pathand enet_path). If I want to change it, do I need to unlock, modify and recompile the whole package sklearn.linear_modelor redefine the whole optimization procedure?

Toy example

The following code defines a data set Xwith 4 functions and a consistent response vector y.

import numpy as np
from sklearn.linear_model import LassoCV

n = 50
x1 = np.random.normal(10, 8, n)
x2 = np.random.normal(8, 6, n)

X = np.column_stack([x1, x1 ** 2, x2, x2 ** 2])
y = .8 * x1 + .2 * x2 + .7 * x2**2 + np.random.normal(0, 3, n)

cv = LassoCV(cv=10).fit(X, y)

The resulting coefficients and alphaare equal

>>> print(cv.coef_)
[ 0.46262115  0.01245427  0.          0.70642803]
>>> print(cv.alpha_)
7.63613474003

If we had preliminary knowledge regarding two of the coefficients w0 = np.array([.8, 0, .2, 0]), how could this be included?

My final decision based on @lejlot's answer

GD Adam. lasso alpha, alpha , LassoCV ( CV ).

from autograd import numpy as np
from autograd import grad
from autograd.optimizers import adam

def fit_lasso(X, y, alpha=0, W0=None):
    if W0 is None:
        W0 = np.zeros(X.shape[1])

    def l1_loss(W, i):
        # i is only used for compatibility with adam
        return np.mean((np.dot(X, W) - y) ** 2) + alpha * np.sum(np.abs(W - W0))

    gradient = grad(l1_loss)

    def print_w(w, i, g):
        if (i + 1) % 250 is 0:
            print("After %i step: w = %s" % (i + 1, np.array2string(w.T)))

    W_init = np.random.normal(size=(X.shape[1], 1))
    W = adam(gradient, W_init, step_size=.1, num_iters=1000, callback=print_w)
    return W

n = 50
x1 = np.random.normal(10, 8, n)
x2 = np.random.normal(8, 6, n)

X = np.column_stack([x1, x1 ** 2, x2, x2 ** 2])
y = .8 * x1 + .2 * x2 + .7 * x2 ** 2 + np.random.normal(0, 3, n)

fit_lasso(X, y, alpha=30)
fit_lasso(X, y, alpha=30, W0=np.array([.8, 0, .2, 0]))

After 250 step: w = [[ 0.886  0.131  0.005  0.291]]
After 500 step: w = [[ 0.886  0.131  0.003  0.291]]
After 750 step: w = [[ 0.886  0.131  0.013  0.291]]
After 1000 step: w = [[ 0.887  0.131  0.013  0.292]]

After 250 step: w = [[ 0.868  0.129  0.728  0.247]]
After 500 step: w = [[ 0.803  0.132  0.717  0.249]]
After 750 step: w = [[ 0.801  0.132  0.714  0.249]]
After 1000 step: w = [[ 0.801  0.132  0.714  0.249]]

, , w0 . , alpha > 20 .

+4

python scikit-learn machine-learning

Backlin 24 . '17 20:08

1

lejlot · Accepted Answer · 2017-01-24T20:35:55+0000

- , , . Scikit-learn ML. , . , , , keras .. , , - . , numpy autograd .

X = ... your data
y = ... your targets
W0 = ... target weights
alpha = ... pulling strength 
lr = ... learning rate (step size of gradient descent)


from autograd import numpy as np
from autograd import grad


def your_loss(W):
  return np.mean((np.dot(X, W) - y)**2) + alpha * np.sum(np.abs(W - W0))

g = grad(your_loss)

W = np.random.normal(size=(X.shape[1], 1))
for i in range(100):
   W = W - lr * g(W)

print(W)

Knowledge transfer in regularized linear regression

Toy example

My final decision based on @lejlot's answer

More articles: