The fastest way to merge pandas data in ranges

I have a dataframe A

  ip_address 0 13 1 5 2 20 3 11 .. ........ 

and another dataframe B

  lowerbound_ip_address upperbound_ip_address country 0 0 10 Australia 1 11 20 China 

based on this i need to add a column to A so that

 ip_address country 13 China 5 Australia 

I have an idea that I should write a function definition and then call a map on each line of A. But how would I look for every line of B for this. There is a better way to do this.

+5
source share
3 answers

Use pd.IntervalIndex

 In [2503]: s = pd.IntervalIndex.from_arrays(dfb.lowerbound_ip_address, dfb.upperbound_ip_address, 'both') In [2504]: dfa.assign(country=dfb.set_index(s).loc[dfa.ip_address].country.values) Out[2504]: ip_address country 0 13 China 1 5 Australia 2 20 China 3 11 China 

More details

 In [2505]: s Out[2505]: IntervalIndex([[0, 10], [11, 20]] closed='both', dtype='interval[int64]') In [2507]: dfb.set_index(s) Out[2507]: lowerbound_ip_address upperbound_ip_address country [0, 10] 0 10 Australia [11, 20] 11 20 China In [2506]: dfb.set_index(s).loc[dfa.ip_address] Out[2506]: lowerbound_ip_address upperbound_ip_address country [11, 20] 11 20 China [0, 10] 0 10 Australia [11, 20] 11 20 China [11, 20] 11 20 China 

Customization

 In [2508]: dfa Out[2508]: ip_address 0 13 1 5 2 20 3 11 In [2509]: dfb Out[2509]: lowerbound_ip_address upperbound_ip_address country 0 0 10 Australia 1 11 20 China 
+9
source

IntervalIndex has a pandas value of 0.20.0, and the solution from @JohnGalt using it is excellent.

Prior to this version, this solution will work, which expands the IP addresses by country for the full range.

 df_ip = pd.concat([pd.DataFrame( {'ip_address': range(row['lowerbound_ip_address'], row['upperbound_ip_address'] + 1), 'country': row['country']}) for _, row in dfb.iterrows()]).set_index('ip_address') >>> dfa.set_index('ip_address').join(df_ip) country ip_address 13 China 5 Australia 20 China 11 China 
+2
source

Try pd.merge_asof

 df['lowerbound_ip_address']=df['ip_address'] pd.merge_asof(df1,df,on='lowerbound_ip_address',direction ='forward',allow_exact_matches =False) Out[811]: lowerbound_ip_address upperbound_ip_address country ip_address 0 0 10 Australia 5 1 11 20 China 13 
0
source

Source: https://habr.com/ru/post/1271735/


All Articles