Pandas Duplicate DataFrame Rows

How can I sort a DataFrame so that the rows in the repeating column are “recycled”.

For example, my original DataFrame looks like this:

In [3]: df
Out[3]: 
    A  B
0  r1  0
1  r1  1
2  r2  2
3  r2  3
4  r3  4
5  r3  5

I would like him to contact:

In [3]: df_sorted
Out[3]: 
    A  B
0  r1  0
2  r2  2
4  r3  4
1  r1  1
3  r2  3
5  r3  5
Lines

sorted so that the rows in the columns Aare in "recirculation" mode.

I was looking for an API in Pandas, but there seems to be no suitable method for this. I can write a complex function for this, but just wondering if there is any smart way or the existing pandas method can do this? Thank you very much in advance.

Update: Apologies for the incorrect statement. In my real problem, the column Bcontains string values.

+4
2

cumcount A, sort_values A ( , ), C. C drop:

df['C'] = df.groupby('A')['A'].cumcount()
df.sort_values(by=['C', 'A'], inplace=True)
print (df)
    A  B  C
0  r1  0  0
2  r2  2  0
4  r3  4  0
1  r1  1  1
3  r2  3  1
5  r3  5  1

df.drop('C', axis=1, inplace=True)
print (df)
    A  B
0  r1  0
2  r2  2
4  r3  4
1  r1  1
3  r2  3
5  r3  5

df (len(df)=6)

In [26]: %timeit (jez(df))
1000 loops, best of 3: 2 ms per loop

In [27]: %timeit (boud(df1))
100 loops, best of 3: 2.52 ms per loop

df (len(df)=6000)

In [23]: %timeit (jez(df))
100 loops, best of 3: 3.44 ms per loop

In [28]: %timeit (boud(df1))
100 loops, best of 3: 2.52 ms per loop

:

df = pd.concat([df]*1000).reset_index(drop=True) 
df1 = df.copy()

def jez(df):
    df['C'] = df.groupby('A')['A'].cumcount()
    df.sort_values(by=['C', 'A'], inplace=True)
    df.drop('C', axis=1, inplace=True)
    return (df)

def boud(df):
    df['C'] = df.groupby('A')['B'].rank()
    df = df.sort_values(['C', 'A'])
    df.drop('C', axis=1, inplace=True)
    return (df)
100 loops, best of 3: 4.29 ms per loop
+3

, , thrid .. , , 'A'.

'A' rank. , :

df['C'] = df.groupby('A')['B'].rank()

df
Out[8]: 
    A  B    C
0  r1  0  1.0
1  r1  1  2.0
2  r2  2  1.0
3  r2  3  2.0
4  r3  4  1.0
5  r3  5  2.0

df.sort_values(['C', 'A'])
Out[9]: 
    A  B    C
0  r1  0  1.0
2  r2  2  1.0
4  r3  4  1.0
1  r1  1  2.0
3  r2  3  2.0
5  r3  5  2.0

'C', .


'B' - . , :

df['C'] = df.reset_index().groupby('A')['index'].rank()
+4

Source: https://habr.com/ru/post/1651244/


All Articles