I am new to pandas and I don't know how to do this.
I have two files that I placed in two different frames:
>> frame1.head()
Out[64]:
Date and Time Sample Unnamed: 2
0 05/18/2017 08:38:37:490 163.7 NaN
1 05/18/2017 08:39:37:490 164.5 NaN
2 05/18/2017 08:40:37:490 148.7 NaN
3 05/18/2017 08:41:37:490 111.2 NaN
4 05/18/2017 08:42:37:490 83.6 NaN
>>frame2.head()
Out[66]:
Date and Time Sample Unnamed: 2
0 05/18/2017 08:38:38:490 7.5 NaN
1 05/18/2017 08:39:38:490 7.5 NaN
2 05/18/2017 08:40:38:490 7.5 NaN
3 05/18/2017 08:41:38:490 7.5 NaN
4 05/18/2017 08:42:38:490 7.5 NaN
I need to "merge" any line from frame 1 with any line in frame 2 that are within one second of each other.
For example, this line from frame 1:
0 05/18/2017 08:38:37:490 163.7 NaN
is within one second of this line from frame 2:
0 05/18/2017 08:38:38:490 7.5 NaN
So, when they are "merged", the output should look like this:
0 05/18/2017 08:38:37:490 163.7 7.5 NaN NaN
In other words, one row has time replaced by another, and all other columns are just added
The closest I came up with is to do something like:
d3 = pd.merge(frame1, frame2, on='Date and Time (MM/DD/YYYY HH:MM:SS:sss)', how='outer')
>>d3.head()
Date and Time Sample_x Unnamed: 2_x Sample_y Unnamed: 2_y
0 05/18/2017 08:38:37:490 163.7 NaN NaN NaN
1 05/18/2017 08:39:37:490 164.5 NaN NaN NaN
2 05/18/2017 08:40:37:490 148.7 NaN NaN NaN
3 05/18/2017 08:41:37:490 111.2 NaN NaN NaN
4 05/18/2017 08:42:37:490 83.6 NaN NaN NaN
But this is not a conditional merger ... I need to unite if they are within one second of each other, and not just the same.
, - :
def compare_time(temp, sec=1):
return abs(current - temp) <= datetime.timedelta(seconds=sec)
.apply() -... ,
: , pd.merge_asof , , / ,
2:
df1 = pd.DataFrame({ 'datetime':pd.date_range('1-1-2017', periods= 4,freq='s'),
'sample': np.arange(4)+100 })
df2 = pd.DataFrame({ 'datetime':pd.date_range('1-1-2017', periods=4,freq='300ms'),
'sample': np.arange(4) })
blah = pd.merge_asof( df2, df1, on='datetime', tolerance=pd.Timedelta('1s') ) \
.append(df1.rename(columns={'sample':'sample_x'})).drop_duplicates('sample_x')
blah
:
datetime sample_x sample_y
0 2017-01-01 00:00:00.000 0 100.0
1 2017-01-01 00:00:00.300 1 100.0
2 2017-01-01 00:00:00.600 2 100.0
3 2017-01-01 00:00:00.900 3 100.0
0 2017-01-01 00:00:00.000 100 NaN
1 2017-01-01 00:00:01.000 101 NaN
2 2017-01-01 00:00:02.000 102 NaN
3 2017-01-01 00:00:03.000 103 NaN
, ( ).