Pandas data cleaning selective data cleaning group group

I am new to pandas and I would like to know how to clear data by extracting only parts of the rows. Let's say I have a data schema as follows:

column1      date    key
A            2016    SB
A            2017    B
B            2015    SB
C            2014    SB
C            2014    PB
C            2015    B
C            2016    SB

how to clear the data so that for each of one column value I only retrieve the first value of two rows and ignore the rest (for example, the value of C, only 2014 SB and 2014 PB is what I get)?

column1      date    key
A            2016    SB
A            2017    B
B            2015    SB
C            2014    SB
C            2014    PB

thank

+4
source share
3 answers

You need GroupBy.headto also check the docs :

df = df.groupby('column1').head(2)
print (df)
  column1  date key
0       A  2016  SB
1       A  2017   B
2       B  2015  SB
3       C  2014  SB
4       C  2014  PB
+7
source
In [82]: df.loc[df.groupby('column1').cumcount().lt(2)]
Out[82]:
  column1  date key
0       A  2016  SB
1       A  2017   B
2       B  2015  SB
3       C  2014  SB
4       C  2014  PB
+4
source

- , @MaxU... .

df.groupby('column1').head(2)

...;) - MaxU

df.drop_duplicates('column1').append(
    df[df.duplicated('column1')].drop_duplicates('column1')
)

  column1  date key
0       A  2016  SB
2       B  2015  SB
3       C  2014  SB
1       A  2017   B
4       C  2014  PB
+4

Source: https://habr.com/ru/post/1682229/


All Articles