Changing the shape of a pandas data frame to as many columns as repeating rows

I have this data frame:

>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) >> df Place Values Var 0 A 250 All 1 A 30 French 2 B 120 All 3 B 12 German 4 C 200 All 5 C 112 Spanish 

It has a repeating two-line pattern for each Place . I want to change it so that there is one row per Place , and the Var column is two columns, one for "All" and one for the other value.

Same:

 Place All Language Value A 250 French 30 B 120 German 12 C 200 Spanish 112 

The pivot table will create a column for each unique value, and I don't want that.

What is the adjustment method for this?

+5
source share
2 answers

Since the data appears in an alternating pattern, we can conceptualize the transformation in 2 steps.

Step 1:

Go from

 a,a,a b,b,b 

For

 a,a,a,b,b,b 

Step 2: flush redundant columns.

The following solution applies reshape to values DataFrame; arguments for change: (-1, df.shape[1] * 2) , which says: "Give me a frame that has twice as many columns and as many rows as you can manage.

Then I ran the column indexes for the filter: [0, 1, 4, 5] based on your data layout. The resulting numpy array has 4 columns, so we pass it to the DataFrame constructor along with the correct column names.

This is an unreadable solution that depends on the df layout and creates the columns in the wrong order;

 import pandas as pd df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]], columns = ['Place', 'All', 'Value', 'Language']) 
+3
source

Another approach:

 df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) df1 = df.set_index('Place').pivot(columns='Var') df1.columns = df1.columns.droplevel() df1 = df1.set_index('All', append=True).stack().reset_index() print(df1) 

Output:

  Place All Var 0 0 A 250.0 French 30.0 1 B 120.0 German 12.0 2 C 200.0 Spanish 112.0 
+2
source

Source: https://habr.com/ru/post/1246248/


All Articles