Changing the shape of a pandas data frame to as many columns as repeating rows

Question

Changing the shape of a pandas data frame to as many columns as repeating rows

I have this data frame:

>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) >> df Place Values Var 0 A 250 All 1 A 30 French 2 B 120 All 3 B 12 German 4 C 200 All 5 C 112 Spanish

It has a repeating two-line pattern for each Place . I want to change it so that there is one row per Place , and the Var column is two columns, one for "All" and one for the other value.

Same:

 Place All Language Value A 250 French 30 B 120 German 12 C 200 Spanish 112

The pivot table will create a column for each unique value, and I don't want that.

What is the adjustment method for this?

+5

python pandas

robroc Apr 01 '16 at 14:56

source share

2 answers

Yakym pirozhenko · Answer 1 · 2016-04-01T15:04:27+0000

Since the data appears in an alternating pattern, we can conceptualize the transformation in 2 steps.

Step 1:

Go from

 a,a,a b,b,b

For

 a,a,a,b,b,b

Step 2: flush redundant columns.

The following solution applies reshape to values DataFrame; arguments for change: (-1, df.shape[1] * 2) , which says: "Give me a frame that has twice as many columns and as many rows as you can manage.

Then I ran the column indexes for the filter: [0, 1, 4, 5] based on your data layout. The resulting numpy array has 4 columns, so we pass it to the DataFrame constructor along with the correct column names.

This is an unreadable solution that depends on the df layout and creates the columns in the wrong order;

 import pandas as pd df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]], columns = ['Place', 'All', 'Value', 'Language'])

Scott boston · Answer 2 · 2017-05-01T21:03:44+0000

Another approach:

 df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) df1 = df.set_index('Place').pivot(columns='Var') df1.columns = df1.columns.droplevel() df1 = df1.set_index('All', append=True).stack().reset_index() print(df1)

Output:

  Place All Var 0 0 A 250.0 French 30.0 1 B 120.0 German 12.0 2 C 200.0 Spanish 112.0

Changing the shape of a pandas data frame to as many columns as repeating rows

More articles: