Sample_weight parameter form error in scikit-learn GridSearchCV

Passing the sample_weight parameter to GridSearchCV causes an error due to the irregular shape. My suspicion is that cross-validation is not able to handle the sample_weights split with the dataset accordingly.

Part one: using sample_weight as a model parameter works great

Consider a simple example: first without GridSearch:

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt


dataURL = 'https://raw.githubusercontent.com/mcasl/PAELLA/master/data/sinusoidal_data.csv'

x = pd.read_csv(dataURL, usecols=["x"]).x
y = pd.read_csv(dataURL, usecols=["y"]).y
occurrences = pd.read_csv(dataURL, usecols=["Occurrences"]).Occurrences
my_sample_weights = (1 - occurrences/10000)**3

my_sample_weightscontains the value that I assign to each observation in x, y, as shown in the following figure. The points of the sinusoidal curve receive higher weights than those that form the background noise.

plt.scatter(x, y, c=my_sample_weights>0.9, cmap="cool")

Color-coded dataset relative to my_sample_weights

Let the neural network train, first without using the information contained in my_sample_weights:

def make_model(number_of_hidden_neurons=1):
    model = Sequential()
    model.add(Dense(number_of_hidden_neurons, input_shape=(1,), activation='tanh'))
    model.add(Dense(1, activation='linear'))
    model.compile(optimizer='sgd', loss='mse')
    return model

net_Not_using_sample_weight = make_model(number_of_hidden_neurons=6)
net_Not_using_sample_weight.fit(x,y, epochs=1000)

plt.scatter(x, y, )
plt.scatter(x, net_Not_using_sample_weight.predict(x), c="green")

, , . enter image description here

, my_sample_weights, . enter image description here

: sample_weight GridSearchCV

my_Regressor = KerasRegressor(make_model)

validator = GridSearchCV(my_Regressor,
                     param_grid={'number_of_hidden_neurons': range(4, 5),
                                 'epochs': [500],
                                },
                     fit_params={'sample_weight': [ my_sample_weights ]},
                     n_jobs=1,
                    )
validator.fit(x, y)

sample_weights :

...
ValueError: Found a sample_weight array with shape (1000,) for an input with shape (666, 1). sample_weight cannot be broadcast.

, sample_weight .

:

import sklearn
print(sklearn.__version__)
0.18.1

import keras
print(keras.__version__)
2.0.5
+4
2

PipeGraph, Scikit-Learn Pipeline, , , , (. http://mcasl.imtqy.com/PipeGraph)

+1

, GridSearch , . , 2/3 1/3 , . 1000 fit_params , (666). .

my_sample_weights = np.random.uniform(size=666)
+1

Source: https://habr.com/ru/post/1679746/


All Articles