Collapse a multi-index DataFrame for regression

Question

Collapse a multi-index DataFrame for regression

I have a Multiindexed DataFrame containing df explanatory variables and a DataFrame containing df_Y response df_Y

 # Create DataFrame for explanatory variables np.arrays = [['foo', 'foo', 'foo', 'bar', 'bar', 'bar'], [1, 2, 3, 1, 2, 3]] df = pd.DataFrame(np.random.randn(6,2), index=pd.MultiIndex.from_tuples(zip(*np.arrays)), columns=['X1', 'X2'])

 # Create DataFrame for response variables df_Y = pd.DataFrame([1, 2, 3], columns=['Y'])

I can only perform regression at the same DataFrame level with index foo

 df_X = df.ix['foo'] # using only 'foo' reg = linear_model.Ridge().fit(df_X, df_Y) reg.coef_

Problem: However, since the Y variables are the same for both the foo and bar levels, we can therefore have twice as many regression patterns if we also include bar .

What is the best way to reformat / collapse / expand a layered DataFrame so that we can use all the data for our regression? Other levels may have smaller lines that df_Y

Sorry for the confusing wording, I'm not sure about the correct terms / phrases

+5

python python-2.7 pandas

Nyxynyx Nov 22 '15 at 22:28

source share

1 answer

Saquib · Answer 1 · 2017-12-23T14:42:32+0000

The first index can be discarded, and then the connection will work:

 df.index = df.index.drop_level() df = df.join(df_Y)

Collapse a multi-index DataFrame for regression

More articles: