I use logistic regression (in scikit) for the binary classification problem, and I am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of a positive class and assessing the importance of each function for this prediction.
Using odds (beta) as an important measure is usually a bad idea, as mentioned here , but I still have to find a good alternative.
So far, the best I've found are the following 3 options:
- Variant of Monte Carlo . Correcting all other functions, re-run the prediction, replacing the function we want to evaluate with random samples from a set of workouts. Do it many times. This would set the base probability for the positive class. Then compare with the probability of a positive class of initial mileage. Difference is an indicator of the importance of a function.
- Classifiers "Leave-one-out":. To evaluate the importance of a function, first create a model that uses all the functions, and then another that uses all the functions except the one under test. Predict a new observation using both models. The difference between them will be important for this function.
- Corrected beta: Based on this answer , evaluating the importance of these functions by the value of its coefficient times the standard deviation of the corresponding parameter in the data. ''
All the options (using beta, Monte Carlo and Single Player) seem like weak decisions to me.
- Monte Carlo depends on the distribution of the training kit, and I cannot find any literature to support it.
- โLeave oneโ will be easily deceived by two correlated signs (when one of them was absent, the other should intervene to compensate, and both values โโwould have a value of 0).
- The adjusted beta versions sound believable, but I canโt find the literature to support it.
Actual question:. What is the best way to interpret the importance of each function at the time a decision is made using a linear classifier?
Quick Note # 1: this is trivial for random forests, we can just use the prediction + bias decomposition, as this blog post explains perfectly. The problem here is how to do something similar with linear classifiers such as logistic regression.
Quick Note # 2: There are a number of related questions about stackoverflow ( 1 2 3 4 5 ), I could not find the answer to this specific question.
source share