Intersection of two dictionaries in Python

Question

Intersection of two dictionaries in Python

I am working on a search program on an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with identification numbers in the form of keys and their textual contents as values.

To perform an AND search for two terms, I therefore need to cross their posting lists (dictionaries). What is the explicit (not necessarily overly smart) way to do this in Python? I started from a long way with iter :

 p1 = index[term1] p2 = index[term2] i1 = iter(p1) i2 = iter(p2) while ... # not sure of the 'iter != end 'syntax in this case ...

+50

python dictionary iteration intersection

nicole Sep 01 '13 at 0:13

source share

7 answers

A little-known fact is that you do not need to create set for this:

In Python 2:

 In [78]: d1 = {'a': 1, 'b': 2} In [79]: d2 = {'b': 2, 'c': 3} In [80]: d1.viewkeys() & d2.viewkeys() Out[80]: {'b'}

In Python 3, replace viewkeys with keys ; the same applies to viewvalues and viewitems .

From the documentation of viewitems :

 In [113]: d1.viewitems?? Type: builtin_function_or_method String Form:<built-in method viewitems of dict object at 0x64a61b0> Docstring: D.viewitems() -> a set-like object providing a view on D items

With large dict this is also a little faster than building a set and then intersecting them:

 In [122]: d1 = {i: rand() for i in range(10000)} In [123]: d2 = {i: rand() for i in range(10000)} In [124]: timeit d1.viewkeys() & d2.viewkeys() 1000 loops, best of 3: 714 µs per loop In [125]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000 loops, best of 3: 805 µs per loop For smaller `dict`s `set` construction is faster: In [126]: d1 = {'a': 1, 'b': 2} In [127]: d2 = {'b': 2, 'c': 3} In [128]: timeit d1.viewkeys() & d2.viewkeys() 1000000 loops, best of 3: 591 ns per loop In [129]: %%timeit s1 = set(d1) s2 = set(d2) res = s1 & s2 1000000 loops, best of 3: 477 ns per loop

We compare nanoseconds here, which may or may not matter to you. You return set anyway, so using viewkeys / keys eliminates the bit of clutter.

+98

Phillip Cloud Sep 01 '13 at 0:25

source share

 In [1]: d1 = {'a':1, 'b':4, 'f':3} In [2]: d2 = {'a':1, 'b':4, 'd':2} In [3]: d = {x:d1[x] for x in d1 if x in d2} In [4]: d Out[4]: {'a': 1, 'b': 4}

+58

emnoor Sep 01 '13 at 4:11

source share

In Python 3 you can use

 intersection = dict(dict1.items() & dict2.items()) union = dict(dict1.items() | dict2.items()) difference = dict(dict1.items() ^ dict2.items())

+12

DCPY Apr 07 '18 at 17:39

source share

Just wrap the dictionary instances with a simple class that gets both the values you want

 class DictionaryIntersection(object): def __init__(self,dictA,dictB): self.dictA = dictA self.dictB = dictB def __getitem__(self,attr): if attr not in self.dictA or attr not in self.dictB: raise KeyError('Not in both dictionaries,key: %s' % attr) return self.dictA[attr],self.dictB[attr] x = {'foo' : 5, 'bar' :6} y = {'bar' : 'meow' , 'qux' : 8} z = DictionaryIntersection(x,y) print z['bar']

+2

Eric Urban Sep 01 '13 at 0:23

source share

Ok, here is a generalized version of the code above in Python3. It is optimized for using concepts and type-type representations that are fast enough.

The function traverses an arbitrary set of dicts and returns a dict with shared keys and a set of shared values for each shared key:

 def dict_intersect(*dicts): comm_keys = dicts[0].keys() for d in dicts[1:]: # intersect keys first comm_keys &= d.keys() # then build a result dict with nested comprehension result = {key:{d[key] for d in dicts} for key in comm_keys} return result

Usage example:

 a = {1: 'ba', 2: 'boon', 3: 'spam', 4:'eggs'} b = {1: 'ham', 2:'baboon', 3: 'sausages'} c = {1: 'more eggs', 3: 'cabbage'} res = dict_intersect(a, b, c) # Here is res (the order of values may vary) : # {1: {'ham', 'more eggs', 'ba'}, 3: {'spam', 'sausages', 'cabbage'}}

Here, the dict values must be hashed if they cannot simply be changed between the brackets of the brackets {} to the list []:

 result = {key:[d[key] for d in dicts] for key in comm_keys}

+2

thodnev Jan 06 '16 at 1:16

source share

Your question is not accurate enough to give one answer.

1. Key intersection

If you want to cross the ID from messages ( credits to James ), do:

 common_ids = p1.keys() & p2.keys()

However, if you want to iterate over documents, you have to consider which post has priority, I assume this is p1 . For iterating documents for common_ids , collections.ChainMap will be most useful:

 from collections import ChainMap intersection = {id: document for id, document in ChainMap(p1, p2) if id in common_ids} for id, document in intersection: ...

Or, if you do not want to create a separate intersection dictionary:

 from collections import ChainMap posts = ChainMap(p1, p2) for id in common_ids: document = posts[id]

2. Intersection of objects

If you want to cross the elements of both publications, which means the coincidence of ID and documents, use the code below ( DCPY credits ). However, this is only useful if you are looking for duplicates in terms.

 duplicates = dict(p1.items() & p2.items()) for id, document in duplicates: ...

3. Go through `p1` 'AND' `p2` .

In the case when, using the search '' AND 'and using iter you had in mind a search for both publications, and then again for collections.ChainMap ChainMap is best to iterate over (almost) all elements in several publications:

 from collections import ChainMap for id, document in ChainMap(p1, p2): ...

0

Jcode Jan 17 '19 at 15:09

source share

James · Accepted Answer · 2013-09-01 00:18

You can easily calculate the intersection of sets, so create sets of keys and use them to intersect:

 keys_a = set(dict_a.keys()) keys_b = set(dict_b.keys()) intersection = keys_a & keys_b # '&' operator is used for set intersection

Intersection of two dictionaries in Python

1. Key intersection

2. Intersection of objects

3. Go through p1 'AND' p2 .

More articles:

3. Go through `p1` 'AND' `p2` .