Effectively perform regression analysis on multiple subsets of pandas columns

Question

Effectively perform regression analysis on multiple subsets of pandas columns

I could choose a shorter question that focuses on the main issue: list permutations . But the reason I bring statsmodels and pandas is that there may be certain tools for stepwise regression that at the same time have the flexibility of preserving the desired regression result, as I am going to show you below, but it is much more efficient. At least I hope so.

Based on the data as shown below:

Code Snippet 1:

# Imports
import pandas as pd
import numpy as np
import itertools
import statsmodels.api as sm

# A datafrane with random numbers
np.random.seed(123)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df_1 = df_1.set_index(rng)

print(df_1)

Screenshot 1:

anlyses y, x1, x2 x3. , , y x1, x2 x3 consequtively. y x1 x2 :

['y', ['x1']]
['y', ['x2']]
['y', ['x3']]
['y', ['x1', 'x2']]
['y', ['x1', 'x2', 'x3']]

:

, , .

listVars:

2:

listExec = [[listVars[0], listVars[1:2]],
       [listVars[0], listVars[2:3]],
       [listVars[0], listVars[3:4]],
       [listVars[0], listVars[1:3]],
       [listVars[0], listVars[1:4]]]

for l in listExec:
    print(l)

2:

listExec (rsquared output.summary()) , :

3:

allResults = []
for l in listExec:
    x = listVars[1]
    x = sm.add_constant(df_1[l[1]])
    model = sm.OLS(df_1[l[0]], x).fit()    
       result = model.rsquared
    allResults.append(result)

(allResults)

3:

, .

:

Python , :

4:

allTuples = list(itertools.permutations(listVars))
allCombos = [list(elem) for elem in allTuples]

4:

, , . , , .

!

+1

python list pandas regression

vestland 05 . '18 13:49

1

vestland · Accepted Answer · 2018-02-16T08:27:53+0000

, , , pandas, , . , , , . , .

:

# Imports
import pandas as pd
import numpy as np
import itertools

# A datafrane with random numbers
np.random.seed(123)
rows = 12
listVars= ['y','x1', 'x2', 'x3']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df_1 = df_1.set_index(rng)

# The function
def StepWise(columns, dependent):
    """ Takes the columns of a pandas dataframe, defines a dependent variable
        and returns all unique combinations of the remaining (independent) variables.

    """

    independent = columns.copy()
    independent.remove(dependent)

    lst1 = []
    lst2 = []
    for i in np.arange(1, len(independent)+1):
        #print(list(itertools.combinations(independent, i)))
        elem = list(itertools.combinations(independent, i))
        lst1.append(elem)
        lst2.extend(elem)

    combosIndependent = [list(elem) for elem in lst2]
    combosAll =  [[dependent, other] for other in combosIndependent]
    return(combosAll)

lExec = StepWise(columns = list(df_1), dependent = 'y')
print(lExec)

3 , pandas.

Effectively perform regression analysis on multiple subsets of pandas columns

More articles: