How can I sort in sections defined by a single column, but leave sections where they are?

Question

How can I sort in sections defined by a single column, but leave sections where they are?

Consider a df data block

 df = pd.DataFrame(dict( A=list('XXYYXXYY'), B=range(8, 0, -1) )) print(df) AB 0 X 8 1 X 7 2 Y 6 3 Y 5 4 X 4 5 X 3 6 Y 2 7 Y 1

With group 'X' defined by column 'A' , I want to sort [8, 7, 4, 3] to the expected [3, 4, 7, 8] . However, I want to leave these lines where they are.

  AB 5 X 3 <-- Notice all X are in same positions 4 X 4 <-- However, `[3, 4, 7, 8]` have shifted 7 Y 1 6 Y 2 1 X 7 <-- 0 X 8 <-- 3 Y 5 2 Y 6

+5

python sorting numpy pandas

piRSquared Apr 4 '17 at 0:46

source share

2 answers

The only way I decided how to effectively solve this problem was to sort twice and expand once.

 v = df.values # argsort just first column with kind='mergesort' to preserve subgroup order a1 = v[:, 0].argsort(kind='mergesort') # Fill in an un-sort array to unwind the `a1` argsort a_ = np.empty_like(a1) a_[a1] = np.arange(len(a1)) # argsort by both columns... not exactly what I want, yet. a2 = np.lexsort(vT[::-1]) # Sort with `a2` then unwind the first layer with `a_` pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns) AB 5 X 3 4 X 4 7 Y 1 6 Y 2 1 X 7 0 X 8 3 Y 5 2 Y 6

Testing

code

 def np_intra_sort(df): v = df.values a1 = v[:, 0].argsort(kind='mergesort') a_ = np.empty_like(a1) a_[a1] = np.arange(len(a1)) a2 = np.lexsort(vT[::-1]) return pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns) def pd_intra_sort(df): def sub_sort(x): return x.sort_values().index idx = df.groupby('A').B.transform(sub_sort).values return df.reindex(idx)

Small data

Big data

 df = pd.DataFrame(dict( A=list('XXYYXXYY') * 10000, B=range(8 * 10000, 0, -1) ))

+2

piRSquared Apr 4 '17 at 0:46

source share

root · Accepted Answer · 2017-04-04T20:27:59+0000

You can use transform to return the new desired index, then use reindex to reorder your DataFrame:

 # Use transform to return the new ordered index values. new_idx = df.groupby('A')['B'].transform(lambda grp: grp.sort_values().index) # Reindex. df = df.reindex(new_idx.rename(None))

If you wish, you could combine the two lines above into one long line.

Result:

  AB 5 X 3 4 X 4 7 Y 1 6 Y 2 1 X 7 0 X 8 3 Y 5 2 Y 6

Note that if you don't care about keeping the old index, you can directly reassign it from transform :

 df['B'] = df.groupby('A')['B'].transform(lambda grp: grp.sort_values())

What gives:

  AB 0 X 3 1 X 4 2 Y 1 3 Y 2 4 X 7 5 X 8 6 Y 5 7 Y 6

How can I sort in sections defined by a single column, but leave sections where they are?

More articles: