Save function names after selecting a Scikit function

After starting the variation threshold from Scikit-Learn on a dataset, it removes several functions. I feel like I am doing something simple but stupid, but I would like to keep the names of the other functions. The following code:

def VarianceThreshold_selector(data):
    selector = VarianceThreshold(.5) 
    selector.fit(data)
    selector = (pd.DataFrame(selector.transform(data)))
    return selector
x = VarianceThreshold_selector(data)
print(x)

modifies the following data (this is just a small subset of the rows):

Survived    Pclass  Sex Age SibSp   Parch   Nonsense
0             3      1  22   1        0        0
1             1      2  38   1        0        0
1             3      2  26   0        0        0

into this (again, a small subset of the rows)

     0         1      2     3
0    3      22.0      1     0
1    1      38.0      1     0
2    3      26.0      0     0

Using the get_support method, I know that it is Pclass, Age, Sibsp, and Parch, so I would prefer it to return something more:

     Pclass         Age      Sibsp     Parch
0        3          22.0         1         0
1        1          38.0         1         0
2        3          26.0         0         0

Is there an easy way to do this? I'm very new to Scikit Learn, so I'm probably just doing something stupid.

+4
source share
4

, ? pandas, get_support, , , , .

>>> df
   Survived  Pclass  Sex  Age  SibSp  Parch  Nonsense
0         0       3    1   22      1      0         0
1         1       1    2   38      1      0         0
2         1       3    2   26      0      0         0
>>> def VarianceThreshold_selector(data):
    columns = data.columns
    selector = VarianceThreshold(.5)
    selector.fit_transform(data)
    labels = [columns[x] for x in selector.get_support(indices=True) if x]
    return pd.DataFrame(selector.fit_transform(data), columns=labels)

>>> VarianceThreshold_selector(df)
   Pclass  Age
0       3   22
1       1   38
2       3   26
+7

, , , , :

def VarianceThreshold_selector(data):

    #Select Model
    selector = VarianceThreshold(0) #Defaults to 0.0, e.g. only remove features with the same value in all samples

    #Fit the Model
    selector.fit(data)
    features = selector.get_support(indices = True) #returns an array of integers corresponding to nonremoved features
    features = [column for column in data[features]] #Array of all nonremoved features' names

    #Format and Return
    selector = pd.DataFrame(selector.transform(data))
    selector.columns = features
    return selector
+5

, transform() fit_transform(), , , .

, :

data_transformed = data.loc[:, selector.get_support()]
+1

Jarad, pteehan, . NA , VarianceThreshold NA.

def variance_threshold_select(df, thresh=0.0, na_replacement=-999):
    df1 = df.copy(deep=True) # Make a deep copy of the dataframe
    selector = VarianceThreshold(thresh)
    selector.fit(df1.fillna(na_replacement)) # Fill NA values as VarianceThreshold cannot deal with those
    df2 = df.loc[:,selector.get_support(indices=False)] # Get new dataframe with columns deleted that have NA values

    return df2
+1

Source: https://habr.com/ru/post/1656482/


All Articles