Not sure if this is a great place for this question, but I was told that CrossValidated was not. So, all these questions are related to sklearn, but if you have an understanding of the logistic regression in general, I would also like to hear them.
1) Should the data be standardized (average value 0, stdev 1)?
2) In sklearn, how can I specify which regularization I want (L1 vs L2)? Please note that this is different from a fine; the penalty refers to classification error, and not to pentalty by odds.
3) How can I use variable selection? Ie, similar to lasso for linear regression.
4) When using regularization, how do I optimize for C regularization strength? Is there something built-in, or should I take care of this myself?
An example would probably be most helpful, but I would appreciate any ideas on any of these issues.
This was my starting point: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Thank you in advance!
source
share