Pandas: get the first grouping of key entries

Question

Pandas: get the first grouping of key entries

If I have the following data file

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 2017-07-12 13:55:00 | 206  | a1
| 10 | 2017-07-12 19:00:00 | 206  | a2
| 11 | 2017-07-12 13:37:00 | 206  | a1
...

I need to group the columns id, id2and get the first occurrence of the value timestamp, for example. for id=10, id2=a1, timestamp=2017-07-12 13:37:00.

I searched for it and found some possible solutions, but I can’t figure out how to implement them correctly. It should probably be something like this:

df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....)

+4

pandas

Novitoll Jul 12 '17 at 12:40

source share

2 answers

You can create a new column after merging the rows id and id2, and then delete the rows where it is duplicated:

df['newcol'] = df.apply(lambda x: str(x.id) + str(x.id2), axis=1)
df = df[~df.newcol.duplicated()].iloc[:,:4]   # iloc used to remove new column.
print(df)

Conclusion:

   id              timestamp  code  id2
0  10   2017-07-12 13:37:00    206   a1
3  10   2017-07-12 19:00:00    206   a2
4  11   2017-07-12 13:37:00    206   a1

0

rnso Jan 6 '18 at 14:50

source share

jezrael · Accepted Answer · 2017-07-12T12:41:37+0000

I think you need GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()

Or drop_duplicates:

df.drop_duplicates(subset=['id','id2'])

For the same output:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

Pandas: get the first grouping of key entries

More articles: