Pandas dataframe: how to group by values in a column and create new columns from grouped values

Question

Pandas dataframe: how to group by values in a column and create new columns from grouped values

I have a dataframe with two columns:

xy 0 1 1 1 2 2 0 5 1 6 2 8 0 1 1 8 2 4 0 1 1 7 2 3

I want to:

 x val1 val2 val3 val4 0 1 5 1 1 1 1 6 8 7 2 2 8 4 3

I know that the values in column x are repeated all N times.

+5

python pandas dataframe

foebu Jan 01 '15 at 13:10

source share

1 answer

unutbu · Accepted Answer · 2016-01-01T13:48:04+0000

You can use groupby/cumcount to assign column numbers and then call pivot :

 import pandas as pd df = pd.DataFrame({'x': [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2], 'y': [1, 1, 2, 5, 6, 8, 1, 8, 4, 1, 7, 3]}) df['columns'] = df.groupby('x')['y'].cumcount() # xy columns # 0 0 1 0 # 1 1 1 0 # 2 2 2 0 # 3 0 5 1 # 4 1 6 1 # 5 2 8 1 # 6 0 1 2 # 7 1 8 2 # 8 2 4 2 # 9 0 1 3 # 10 1 7 3 # 11 2 3 3 result = df.pivot(index='x', columns='columns') print(result)

gives

  y columns 0 1 2 3 x 0 1 5 1 1 1 1 6 8 7 2 2 8 4 3

Or, if you can really rely on values in x repeating in order N times,

 N = 3 result = pd.DataFrame(df['y'].values.reshape(-1, N).T)

gives

  0 1 2 3 0 1 5 1 1 1 1 6 8 7 2 2 8 4 3

Using reshape is faster than calling groupby/cumcount and pivot , but it is less reliable because it relies on the values in y appearing in the correct order.

Pandas dataframe: how to group by values ​​in a column and create new columns from grouped values

More articles:

Pandas dataframe: how to group by values in a column and create new columns from grouped values