Collapse a multi-index DataFrame for regression

I have a Multiindexed DataFrame containing df explanatory variables and a DataFrame containing df_Y response df_Y

 # Create DataFrame for explanatory variables np.arrays = [['foo', 'foo', 'foo', 'bar', 'bar', 'bar'], [1, 2, 3, 1, 2, 3]] df = pd.DataFrame(np.random.randn(6,2), index=pd.MultiIndex.from_tuples(zip(*np.arrays)), columns=['X1', 'X2']) 

enter image description here

 # Create DataFrame for response variables df_Y = pd.DataFrame([1, 2, 3], columns=['Y']) 

enter image description here

I can only perform regression at the same DataFrame level with index foo

 df_X = df.ix['foo'] # using only 'foo' reg = linear_model.Ridge().fit(df_X, df_Y) reg.coef_ 

Problem: However, since the Y variables are the same for both the foo and bar levels, we can therefore have twice as many regression patterns if we also include bar .

enter image description here

What is the best way to reformat / collapse / expand a layered DataFrame so that we can use all the data for our regression? Other levels may have smaller lines that df_Y

Sorry for the confusing wording, I'm not sure about the correct terms / phrases

+5
source share
1 answer

The first index can be discarded, and then the connection will work:

 df.index = df.index.drop_level() df = df.join(df_Y) 
0
source

Source: https://habr.com/ru/post/1236573/


All Articles