Pandas: connecting information between two data frames

I have two data frames. The information frame Acontains information about the trip:

Id  Name        StartTime           EndTime
0   201 Car1    2016-01-01 00:00:00 2016-01-01 00:43:05
1   205 Car2    2016-01-01 00:10:00 2016-01-01 00:45:05
2   345 Car3    2016-01-01 00:01:00 2016-01-01 00:47:05
3   456 Car2    2016-01-02 00:00:00 2016-01-02 02:45:05
4   432 Car1    2016-01-02 00:00:00 2016-01-02 02:47:05

The information frame Bcontains time stamps during the trip (for example, gps).

    Name    Timestamp
0   Car1    2016-01-01 00:05:00
1   Car1    2016-01-01 00:05:24
2   Car2    2016-01-01 00:10:04
3   Car3    2016-01-01 00:01:04
4   Car2    2016-01-01 00:10:34
5   Car1    2016-01-01 00:05:54

I need to add a column in the Dataframe Bcalled Idthat picks the Id from the data frame Abased on the name and the beginning and end of time in the frame A. Both of these frameworks are really big, so I need an efficient way to do this.

+4
source share
2 answers

merge_asof. ( B) :

DataFrame DataFrame, 'on' . DataFrames .

,

dfa['StartTime'] = pd.to_datetime(dfa.StartTime)
dfa['EndTime'] = pd.to_datetime(dfa.EndTime)
dfb['Timestamp'] = pd.to_datetime(dfb.Timestamp)

dfb = dfb.sort_values('Timestamp')
dfa = dfa.sort_values('StartTime')

Perfom asof merge by 'Name'

pd.merge_asof(dfb, dfa, left_on='Timestamp', right_on='StartTime', by='Name')

   Name           Timestamp   Id           StartTime             EndTime
0  Car3 2016-01-01 00:01:04  345 2016-01-01 00:01:00 2016-01-01 00:47:05
1  Car1 2016-01-01 00:05:00  201 2016-01-01 00:00:00 2016-01-01 00:43:05
2  Car1 2016-01-01 00:05:24  201 2016-01-01 00:00:00 2016-01-01 00:43:05
3  Car1 2016-01-01 00:05:54  201 2016-01-01 00:00:00 2016-01-01 00:43:05
4  Car2 2016-01-01 00:10:04  205 2016-01-01 00:10:00 2016-01-01 00:45:05
5  Car2 2016-01-01 00:10:34  205 2016-01-01 00:10:00 2016-01-01 00:45:05
+1

, merge Name, boolean indexing drop:

df = pd.merge(df1, df2, on='Name', how='outer')
df = df[(df.StartTime <= df.Timestamp) & (df.EndTime >= df.Timestamp)]
df = df.drop(['StartTime','EndTime'], axis=1)
print (df)
     Id  Name           Timestamp
0   201  Car1 2016-01-01 00:05:00
1   201  Car1 2016-01-01 00:05:24
2   201  Car1 2016-01-01 00:05:54
6   205  Car2 2016-01-01 00:10:04
7   205  Car2 2016-01-01 00:10:34
10  345  Car3 2016-01-01 00:01:04
+1

Source: https://habr.com/ru/post/1664454/


All Articles