How can I sort in sections defined by a single column, but leave sections where they are?

Consider a df data block

 df = pd.DataFrame(dict( A=list('XXYYXXYY'), B=range(8, 0, -1) )) print(df) AB 0 X 8 1 X 7 2 Y 6 3 Y 5 4 X 4 5 X 3 6 Y 2 7 Y 1 

With group 'X' defined by column 'A' , I want to sort [8, 7, 4, 3] to the expected [3, 4, 7, 8] . However, I want to leave these lines where they are.

  AB 5 X 3 <-- Notice all X are in same positions 4 X 4 <-- However, `[3, 4, 7, 8]` have shifted 7 Y 1 6 Y 2 1 X 7 <-- 0 X 8 <-- 3 Y 5 2 Y 6 
+5
source share
2 answers

You can use transform to return the new desired index, then use reindex to reorder your DataFrame:

 # Use transform to return the new ordered index values. new_idx = df.groupby('A')['B'].transform(lambda grp: grp.sort_values().index) # Reindex. df = df.reindex(new_idx.rename(None)) 

If you wish, you could combine the two lines above into one long line.

Result:

  AB 5 X 3 4 X 4 7 Y 1 6 Y 2 1 X 7 0 X 8 3 Y 5 2 Y 6 

Note that if you don't care about keeping the old index, you can directly reassign it from transform :

 df['B'] = df.groupby('A')['B'].transform(lambda grp: grp.sort_values()) 

What gives:

  AB 0 X 3 1 X 4 2 Y 1 3 Y 2 4 X 7 5 X 8 6 Y 5 7 Y 6 
+3
source

The only way I decided how to effectively solve this problem was to sort twice and expand once.

 v = df.values # argsort just first column with kind='mergesort' to preserve subgroup order a1 = v[:, 0].argsort(kind='mergesort') # Fill in an un-sort array to unwind the `a1` argsort a_ = np.empty_like(a1) a_[a1] = np.arange(len(a1)) # argsort by both columns... not exactly what I want, yet. a2 = np.lexsort(vT[::-1]) # Sort with `a2` then unwind the first layer with `a_` pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns) AB 5 X 3 4 X 4 7 Y 1 6 Y 2 1 X 7 0 X 8 3 Y 5 2 Y 6 

Testing

code

 def np_intra_sort(df): v = df.values a1 = v[:, 0].argsort(kind='mergesort') a_ = np.empty_like(a1) a_[a1] = np.arange(len(a1)) a2 = np.lexsort(vT[::-1]) return pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns) def pd_intra_sort(df): def sub_sort(x): return x.sort_values().index idx = df.groupby('A').B.transform(sub_sort).values return df.reindex(idx) 

Small data

Enter a description of the image here.

Big data

 df = pd.DataFrame(dict( A=list('XXYYXXYY') * 10000, B=range(8 * 10000, 0, -1) )) 

Enter a description of the image here.

+2
source

Source: https://habr.com/ru/post/1266280/


All Articles