I want to make some kind of function selection using the python and scikit-learn libraries.
As I know, Lasso regression can be used to select a function, for example, for one-dimensional selection.
My simple data set is as follows.
G1 G2 G3 ... GN Class 1.0 4.0 5.0 ... 1.0 X 4.0 5.0 9.0 ... 1.0 X 9.0 6.0 3.0 ... 2.0 Y ...
I want to find top-N (Gs) attributes that can greatly affect a class, with lasso regression. Although I have to handle the parameters, lasso regression can be applied like this.
lasso = Lasso()
Is it right to decide that an attribute is more associated with the class if it has a higher lasso.coef_ value? Also, I want to know if there is any rule for selecting top-N genes using regression. If I use PCC, a P value such as .05 can be used as a threshold value for selection, but I donβt know how to deal with Lasso. Can someone give me an idea please?
source share