Intersection of two dictionaries in Python

I am working on a search program on an inverted index. The index itself is a dictionary whose keys are terms and whose values โ€‹โ€‹are themselves dictionaries of short documents, with identification numbers in the form of keys and their textual contents as values.

To perform an AND search for two terms, I therefore need to cross their posting lists (dictionaries). What is the explicit (not necessarily overly smart) way to do this in Python? I started from a long way with iter :

 p1 = index[term1] p2 = index[term2] i1 = iter(p1) i2 = iter(p2) while ... # not sure of the 'iter != end 'syntax in this case ... 
+50
python dictionary iteration intersection
Sep 01 '13 at 0:13
source share
7 answers

You can easily calculate the intersection of sets, so create sets of keys and use them to intersect:

 keys_a = set(dict_a.keys()) keys_b = set(dict_b.keys()) intersection = keys_a & keys_b # '&' operator is used for set intersection 
+55
01 Sep '13 at 0:18
source share

A little-known fact is that you do not need to create set for this:

In Python 2:

 In [78]: d1 = {'a': 1, 'b': 2} In [79]: d2 = {'b': 2, 'c': 3} In [80]: d1.viewkeys() & d2.viewkeys() Out[80]: {'b'} 

In Python 3, replace viewkeys with keys ; the same applies to viewvalues and viewitems .

From the documentation of viewitems :

 In [113]: d1.viewitems?? Type: builtin_function_or_method String Form:<built-in method viewitems of dict object at 0x64a61b0> Docstring: D.viewitems() -> a set-like object providing a view on D items 

With large dict this is also a little faster than building a set and then intersecting them:

 In [122]: d1 = {i: rand() for i in range(10000)} In [123]: d2 = {i: rand() for i in range(10000)} In [124]: timeit d1.viewkeys() & d2.viewkeys() 1000 loops, best of 3: 714 ยตs per loop In [125]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000 loops, best of 3: 805 ยตs per loop For smaller `dict`s `set` construction is faster: In [126]: d1 = {'a': 1, 'b': 2} In [127]: d2 = {'b': 2, 'c': 3} In [128]: timeit d1.viewkeys() & d2.viewkeys() 1000000 loops, best of 3: 591 ns per loop In [129]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000000 loops, best of 3: 477 ns per loop 

We compare nanoseconds here, which may or may not matter to you. You return set anyway, so using viewkeys / keys eliminates the bit of clutter.

+98
Sep 01 '13 at 0:25
source share
 In [1]: d1 = {'a':1, 'b':4, 'f':3} In [2]: d2 = {'a':1, 'b':4, 'd':2} In [3]: d = {x:d1[x] for x in d1 if x in d2} In [4]: d Out[4]: {'a': 1, 'b': 4} 
+58
Sep 01 '13 at 4:11
source share

In Python 3 you can use

 intersection = dict(dict1.items() & dict2.items()) union = dict(dict1.items() | dict2.items()) difference = dict(dict1.items() ^ dict2.items()) 
+12
Apr 07 '18 at 17:39
source share

Just wrap the dictionary instances with a simple class that gets both the values โ€‹โ€‹you want

 class DictionaryIntersection(object): def __init__(self,dictA,dictB): self.dictA = dictA self.dictB = dictB def __getitem__(self,attr): if attr not in self.dictA or attr not in self.dictB: raise KeyError('Not in both dictionaries,key: %s' % attr) return self.dictA[attr],self.dictB[attr] x = {'foo' : 5, 'bar' :6} y = {'bar' : 'meow' , 'qux' : 8} z = DictionaryIntersection(x,y) print z['bar'] 
+2
Sep 01 '13 at 0:23
source share

Ok, here is a generalized version of the code above in Python3. It is optimized for using concepts and type-type representations that are fast enough.

The function traverses an arbitrary set of dicts and returns a dict with shared keys and a set of shared values โ€‹โ€‹for each shared key:

 def dict_intersect(*dicts): comm_keys = dicts[0].keys() for d in dicts[1:]: # intersect keys first comm_keys &= d.keys() # then build a result dict with nested comprehension result = {key:{d[key] for d in dicts} for key in comm_keys} return result 

Usage example:

 a = {1: 'ba', 2: 'boon', 3: 'spam', 4:'eggs'} b = {1: 'ham', 2:'baboon', 3: 'sausages'} c = {1: 'more eggs', 3: 'cabbage'} res = dict_intersect(a, b, c) # Here is res (the order of values may vary) : # {1: {'ham', 'more eggs', 'ba'}, 3: {'spam', 'sausages', 'cabbage'}} 

Here, the dict values โ€‹โ€‹must be hashed if they cannot simply be changed between the brackets of the brackets {} to the list []:

 result = {key:[d[key] for d in dicts] for key in comm_keys} 
+2
Jan 06 '16 at 1:16
source share

Your question is not accurate enough to give one answer.

1. Key intersection

If you want to cross the ID from messages ( credits to James ), do:

 common_ids = p1.keys() & p2.keys() 

However, if you want to iterate over documents, you have to consider which post has priority, I assume this is p1 . For iterating documents for common_ids , collections.ChainMap will be most useful:

 from collections import ChainMap intersection = {id: document for id, document in ChainMap(p1, p2) if id in common_ids} for id, document in intersection: ... 

Or, if you do not want to create a separate intersection dictionary:

 from collections import ChainMap posts = ChainMap(p1, p2) for id in common_ids: document = posts[id] 

2. Intersection of objects

If you want to cross the elements of both publications, which means the coincidence of ID and documents, use the code below ( DCPY credits ). However, this is only useful if you are looking for duplicates in terms.

 duplicates = dict(p1.items() & p2.items()) for id, document in duplicates: ... 

3. Go through p1 'AND' p2 .

In the case when, using the search '' AND 'and using iter you had in mind a search for both publications, and then again for collections.ChainMap ChainMap is best to iterate over (almost) all elements in several publications:

 from collections import ChainMap for id, document in ChainMap(p1, p2): ... 
0
Jan 17 '19 at 15:09
source share



All Articles