Scikit-learn user-defined evaluation function requires values from a dataset other than X and y

Question

Scikit-learn user-defined evaluation function requires values from a dataset other than X and y

I am trying to evaluate a model based on its performance in a historical competition.

I have a dataset consisting of the following columns:

feature1 | ... | featureX | oddsPlayerA | oddsPlayerB | winner

The model will do a regression, where the conclusion is that playerA wins the match

As far as I understand, I can use my own scoring function to return the “money” that the model would make if it bets every time the condition is true and uses this value to measure the suitability of the model. The condition is something like:

 if prediction_player_A_win_odds < oddsPlayerA money += bet_playerA(oddsPlayerA, winner) if inverse_odd(prediction_player_A_win_odds) < oddsPlayerB money += bet_playerB(oddsPlayerB, winner)

In the custom counting function, I need to get the usual arguments, such as "ground_truth, predions" (where ground_truth is the winner [], and the predictions are prediction_player_A_win_odds []) , but also the fields "oddsPlayerA" and "oddsPlayerB" from the dataset (and here is the problem !).

If the custom count function was called with the data in the same order as the original dataset, it would be trivial to get the extra data needed from the dataset. But in reality, when using cross-validation methods, the data that it receives is mixed (compared to the original).

I tried the most obvious approach, which was to pass the variable y using [oddsA, oddsB, winner] (sizes [n, 3]), but scikit did not allow this.

So, how can I get data from a dataset into a user-defined counting function that is neither X nor y, but is still “connected” in the same order?

+6

python scikit-learn regression scoring

joaoroque Nov 03 '14 at 0:54

source share

1 answer

Andreas Mueller · Accepted Answer · 2014-11-03T18:35:07+0000

There is currently no way to do this, sorry. You can write your own loop over cross-validation crossings, which shouldn't be hard. You cannot do this using GridSearchCV or cross_val_score

Scikit-learn user-defined evaluation function requires values ​​from a dataset other than X and y

More articles:

Scikit-learn user-defined evaluation function requires values from a dataset other than X and y