Logistic regression with sklearn

Not sure if this is a great place for this question, but I was told that CrossValidated was not. So, all these questions are related to sklearn, but if you have an understanding of the logistic regression in general, I would also like to hear them.

1) Should the data be standardized (average value 0, stdev 1)?
2) In sklearn, how can I specify which regularization I want (L1 vs L2)? Please note that this is different from a fine; the penalty refers to classification error, and not to pentalty by odds.
3) How can I use variable selection? Ie, similar to lasso for linear regression.
4) When using regularization, how do I optimize for C regularization strength? Is there something built-in, or should I take care of this myself?

An example would probably be most helpful, but I would appreciate any ideas on any of these issues.

This was my starting point: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Thank you in advance!

+4
source share
1 answer

1) No for logistic regression. You do not calculate distances between instances.

2) You can specify the parameter penalty='l1'or penalty='l2'. See the LogisticRegression Page . The default penalty is L2.

3) There are various methods for highlighting explicit features that provide scikit-learn, for example. using SelectKBest with chi2 function .

4) For the optimal parameter, you will need Grid Search .

Examples, . .

+4

Source: https://habr.com/ru/post/1608566/


All Articles