Merging two data with interval data in one of them

Question

Merging two data with interval data in one of them

As input, I have two data frames:

data1 = [{'code':100}, {'code':120}, {'code':110}] data1 = pd.DataFrame(data1) code 0 100 1 120 2 110 data2 = [{'category':1, 'l_bound':99, 'r_bound':105},{'category':2, 'l_bound':107, 'r_bound':110},{'category':3, 'l_bound':117, 'r_bound':135}] data2 = pd.DataFrame(data2) category l_bound r_bound 0 1 99 105 1 2 107 110 2 3 117 135

I want to get the following data frame at the end, with an additional column in the first data frame as a category number, if the code lies in the corresponding interval:

  code category 0 100 1 1 120 3 2 110 2

Intervals are random, and the source data is quite large. Looping with itertuples is too slow. Any pythonic solutions?

+5

python pandas dataframe

Anna Ignashkina Dec 30 '17 at 17:53

source share

1 answer

Anton vBR · Accepted Answer · 2017-12-30T18:28:04+0000

Recover data set:

 import pandas as pd data1 = [{'code':100}, {'code':120}, {'code':113}] data2 = [{'category':1, 'l_bound':99, 'r_bound':105}, {'category':2, 'l_bound':107, 'r_bound':110}, {'category':3, 'l_bound':117, 'r_bound':135}] data1 = pd.DataFrame(data1) data2 = pd.DataFrame(data2)

@ cᴏʟᴅsᴘᴇᴇᴅ answer ( preferred ), follow the double link:

 idx = pd.IntervalIndex.from_arrays(data2['l_bound'], data2['r_bound'], closed='both') category = data2.loc[idx.get_indexer(data1.code), 'category'] data1['category'] = category.values

Here is a different approach. Create a map with a value in the range and categories.

 # Create a map d = {i:k for k,v in data2.set_index('category').to_dict('i').items() for i in range(v['l_bound'],v['r_bound']+1)} # Use map to add new column data1['category'] = data1.code.map(d)

Finally

 print(data1)

Return:

  code category 0 100 1.0 1 120 3.0 2 113 NaN

If you want an int, we can do this:

 data1.code.map(d).fillna(-1).astype(int) # -1 meaning no match

And we get:

  code category 0 100 1 1 120 3 2 113 -1

Merging two data with interval data in one of them

More articles: