How to get the symmetric difference of two dictionaries

I was looking for a solution to find the symmetric difference between two dictionaries in Python.

For example, if I have two dictionaries A and B, and I want to create a third dictionary C, which contains all the elements from A and B that are not found in the other, or, in other words, unique.

I could not find a canonical answer, so I decided to open this question and give it my own answer. If you think you have the best method, I would love to see it.


Some data:

a = {'a': 1, 'b':2} b = {'b': 2, 'c':3} 

Required Conclusion:

 {'a': 1, 'c': 3} 
+5
source share
6 answers

Here is some code that performs timeit speed tests using various algorithms.

The tests use pairs of dictations of the same size. Keys are short strings of random letters with different proportions of shared keys between dicts. Dikts are built from shuffled lists, so even if they contain many common keys, the basic structure of the hash table of the two dicts should be completely different.

The exact number of shared keys is random, the proportion of shared keys is controlled by the shared make_dicts argument.

The bulk of this code will run on Python 2.6+ and Python 3. I have Python 2.6.6 and Python 3.6.0 installed on this computer (this is a single-core 32-bit machine with 2 GB of RAM running on an old Debian-derived Linux ) Some symmetric dictionary differential functions use dictionary methods that are not available in Python 2.6, so I could not test these functions in Python 2. In addition, elmex_dsd_py2 will not run in Python 3, so I commented on this from. I originally intended to publish the results of Python 2.6, but I had to cut the output to fit the message size limits.

 #!/usr/bin/env python3 # -*- coding: utf-8 -*- ''' Dictionary symmetric difference Speed tests of various implementations See http://stackoverflow.com/q/42650081/4014959 Speed test code by PM 2Ring 2017.03.08 ''' from __future__ import print_function from itertools import product from random import random, seed, shuffle from string import ascii_letters from timeit import Timer seed(163) # The dict symmetric difference functions ------------------------------ def inbar_dsd_long(a, b): # first make sets of the dictionary keys keys_in_a = set(a.keys()) keys_in_b = set(b.keys()) # get the unique keys unique_keys = keys_in_a.symmetric_difference(keys_in_b) # start an empty dictionary c = {} # iterate over the keys for key in unique_keys: if key in a: # if the key is from a dictionary, take the value from there. c[key] = a[key] else: # the key is in b dictionary, take the value from there. c[key] = b[key] return c def pm2r_dsd_py2(a, b): return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys())) #def elmex_dsd_py2(a, b): #symm_diff = set(a) ^ set(b) #return dict((k, v) for k, v in a.items() + b.items() if k in symm_diff) def raymond_dsd(a, b): c = a.copy() c.update(b) for k in (a.keys() & b.keys()): del c[k] return c def inbar_dsd_short(a, b): return {k: a[k] if k in a else b[k] for k in set(a.keys()).symmetric_difference(b.keys())} def pm2r_dsd_py3(a, b): return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()} def evkounis_dsd(a, b): res = {k:v for k, v in a.items() if k not in b} res.update({k:v for k, v in b.items() if k not in a}) return res def elmex_dsd_py3(a, b): symm_diff = set(a) ^ set(b) return {k: v for k, v in list(a.items()) + list(b.items()) if k in symm_diff} funcs = ( inbar_dsd_long, pm2r_dsd_py2, #elmex_dsd_py2, raymond_dsd, inbar_dsd_short, pm2r_dsd_py3, evkounis_dsd, elmex_dsd_py3, ) # ---------------------------------------------------------------------- # Random key strings all_keys = [''.join(t) for t in product(ascii_letters, repeat=3)] shuffle(all_keys) def make_dicts(size, shared): ''' Make a pair of dicts of length `size`, with random key strings. `shared` is a real number 0 <= shared <= 1 giving the approximate ratio of shared keys. ''' a, b = [], [] keys = iter(all_keys) shared_count = 0 for i in range(size): ka = next(keys) if random() < shared: kb = ka shared_count += 1 else: kb = next(keys) a.append((ka, i)) b.append((kb, i)) shuffle(a) shuffle(b) return dict(a), dict(b), shared_count def verify(a, b): ''' Verify that all functions return the same result ''' results = [func(a, b) for func in funcs] last = results[-1] print(all(last == u for u in results[:-1])) def time_test(loops, reps): ''' Print timing stats for all the functions ''' timings = [] for func in funcs: fname = func.__name__ setup = 'from __main__ import a, b, ' + fname cmd = '{0}(a, b)'.format(fname) t = Timer(cmd, setup) result = t.repeat(reps, loops) result.sort() timings.append((result, fname)) timings.sort() for result, fname in timings: print('{0:16} {1}'.format(fname, result)) # ---------------------------------------------------------------------- print('Verifying') size = 1000 a, b, shared_count = make_dicts(size, 0.1) print('size: {0}, shared count: {1}'.format(size, shared_count)) verify(a, b) # Timeit tests reps = 3 fmt = '\nsize: {0}, shared count: {1}, loops: {2}' for shared in (0.1, 0.25, 0.5, 0.75, 0.9): print('\nSHARED: {0:0.2f}'.format(shared)) #for size in (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000): for size in (10, 100, 1000, 10000): a, b, shared_count = make_dicts(size, shared) loops = 100000 // size print(fmt.format(size, shared_count, loops)) time_test(loops, reps) 

Output

 Verifying size: 1000, shared count: 100 True SHARED: 0.10 size: 10, shared count: 1, loops: 10000 raymond_dsd [0.13777699099955498, 0.13792390800153953, 0.1381044740010111] evkounis_dsd [0.23560065399942687, 0.23752641000100994, 0.2455631840020942] pm2r_dsd_py3 [0.23770248700020602, 0.23880975800057058, 0.24221741200017277] inbar_dsd_long [0.25206301800062647, 0.285963577000075, 0.28780356199786183] inbar_dsd_short [0.2636144610005431, 0.2653795980004361, 0.2666834120027488] elmex_dsd_py3 [0.3290278729982674, 0.33175632400161703, 0.3384615989998565] pm2r_dsd_py2 [0.3978280019982776, 0.43710133700005827, 0.4523775029992976] size: 100, shared count: 14, loops: 1000 raymond_dsd [0.09872918600012781, 0.09888040100122453, 0.10413656799937598] evkounis_dsd [0.1804931380029302, 0.1811683220003033, 0.18133216399655794] pm2r_dsd_py3 [0.20522897000046214, 0.20773609400202986, 0.20979003499815008] inbar_dsd_short [0.21217649699974572, 0.21281453499977943, 0.21295483400172088] inbar_dsd_long [0.22985933599920827, 0.23097444899758557, 0.24446944000010262] elmex_dsd_py3 [0.24242248500013375, 0.24477665499944123, 0.24785449900082313] pm2r_dsd_py2 [0.3103436530000181, 0.31146229099977063, 0.3152951789998042] size: 1000, shared count: 94, loops: 100 raymond_dsd [0.10726087399962125, 0.10726979699757067, 0.10853421000138042] evkounis_dsd [0.19798667299983208, 0.19957152200004202, 0.20145120699817198] pm2r_dsd_py3 [0.24767412599976524, 0.25033419099781895, 0.25519442899894784] inbar_dsd_long [0.25753367499783053, 0.259813735003263, 0.2615334299989627] inbar_dsd_short [0.25835196700063534, 0.2647503340012918, 0.26879757099959534] elmex_dsd_py3 [0.3065065359987784, 0.3129320820007706, 0.3159641370002646] pm2r_dsd_py2 [0.32748841799912043, 0.34595297499981825, 0.3797209490003297] size: 10000, shared count: 987, loops: 10 raymond_dsd [0.2801321059996553, 0.2831085340003483, 0.28407657299976563] evkounis_dsd [0.36119127300116816, 0.36392319399965345, 0.36926983400189783] pm2r_dsd_py3 [0.5073807749977277, 0.5122791090034298, 0.5579565990010451] inbar_dsd_short [0.5086212060014077, 0.5168500030013092, 0.5182715480004845] inbar_dsd_long [0.602521363998676, 0.6031914080012939, 0.6047401769974385] pm2r_dsd_py2 [0.6753699099972437, 0.6772755890015105, 0.6782451350009069] elmex_dsd_py3 [0.7430517110005894, 0.7464511920006771, 0.7468688779990771] SHARED: 0.25 size: 10, shared count: 3, loops: 10000 raymond_dsd [0.1376171269985207, 0.13765478899949812, 0.13801490599871613] pm2r_dsd_py3 [0.20131645299989032, 0.20166713100115885, 0.20322838700303691] inbar_dsd_long [0.20759937799812178, 0.2079929980027373, 0.21979623799779802] evkounis_dsd [0.2186124869986088, 0.2202955180000572, 0.223359776999132] inbar_dsd_short [0.23444793200178538, 0.23780764999901294, 0.23976211099943612] elmex_dsd_py3 [0.3178573650002363, 0.3193927319989598, 0.32410190099835745] pm2r_dsd_py2 [0.3520881920012471, 0.3543025139988458, 0.3581208620016696] size: 100, shared count: 23, loops: 1000 raymond_dsd [0.10508492400185787, 0.10563860000183922, 0.10888238600091427] evkounis_dsd [0.15686738300064462, 0.15824111300025834, 0.15863642399926903] pm2r_dsd_py3 [0.1829918709991034, 0.184900373002165, 0.18732255400027498] inbar_dsd_short [0.18875792199833086, 0.19031438200181583, 0.19102797700179508] inbar_dsd_long [0.21139359699736815, 0.22990316799769062, 0.2418856490003236] elmex_dsd_py3 [0.22641843899691594, 0.2265430750012456, 0.23143781299950206] pm2r_dsd_py2 [0.2681290770015039, 0.2703527909980039, 0.27255326500016963] size: 1000, shared count: 263, loops: 100 raymond_dsd [0.10895683100170572, 0.11233176399764488, 0.11593639900092967] evkounis_dsd [0.17859331599902362, 0.17949835600302322, 0.18466946999978973] pm2r_dsd_py3 [0.2147589500018512, 0.21515577800164465, 0.21701817199937068] inbar_dsd_long [0.21823484400010784, 0.2254721450008219, 0.22556141600216506] inbar_dsd_short [0.22114897099891095, 0.22157548800169025, 0.22668778500155895] pm2r_dsd_py2 [0.2780861230021401, 0.27864550599770155, 0.28336624599978677] elmex_dsd_py3 [0.28186336900034803, 0.2837228559983487, 0.29606761199829634] size: 10000, shared count: 2480, loops: 10 raymond_dsd [0.278912030000356, 0.28916871899855323, 0.2898256120024598] evkounis_dsd [0.33290919899809523, 0.3355702890003158, 0.3366183610014559] pm2r_dsd_py3 [0.4445611189985357, 0.45341551800083835, 0.4544847100005427] inbar_dsd_short [0.4466933030016662, 0.4632708070021181, 0.48025122500257567] inbar_dsd_long [0.5405201060020772, 0.5567013979998592, 0.5911358039993502] pm2r_dsd_py2 [0.586115582002094, 0.600204237998696, 0.6029243630000565] elmex_dsd_py3 [0.7058123890019488, 0.7067292030005774, 0.7115862030004791] SHARED: 0.50 size: 10, shared count: 6, loops: 10000 raymond_dsd [0.15135921700129984, 0.1533788429987908, 0.17841531700105406] pm2r_dsd_py3 [0.15311526600271463, 0.15356177799912984, 0.15895434199774172] inbar_dsd_long [0.16137141400031396, 0.1618921000008413, 0.17238240400183713] inbar_dsd_short [0.1808154470018053, 0.18266997299724608, 0.1863039679992653] evkounis_dsd [0.18221631199776311, 0.18251911100014695, 0.18520446800175705] pm2r_dsd_py2 [0.2700158850020671, 0.2743520539988822, 0.28932957600045484] elmex_dsd_py3 [0.28983224500188953, 0.2912340100010624, 0.2933940119983163] size: 100, shared count: 51, loops: 1000 raymond_dsd [0.10294843999872683, 0.10327848499946413, 0.10685922099946765] evkounis_dsd [0.13586801600104081, 0.13726477299860562, 0.142784658997698] pm2r_dsd_py3 [0.1435330319982313, 0.14396326799760573, 0.14474550500017358] inbar_dsd_short [0.15043617100309348, 0.15080328300246038, 0.1527250040016952] inbar_dsd_long [0.1667091649978829, 0.17330403699816088, 0.17601154400108499] pm2r_dsd_py2 [0.20728979400155367, 0.20776088099955814, 0.2079896369978087] elmex_dsd_py3 [0.21078268400015077, 0.2123827169998549, 0.21517163300086395] size: 1000, shared count: 491, loops: 100 raymond_dsd [0.11212847299975692, 0.11414236799828359, 0.11498476199994911] evkounis_dsd [0.14059560900204815, 0.14112727400060976, 0.150327464001748] pm2r_dsd_py3 [0.14733014900048147, 0.15143406900097034, 0.1542897660001472] inbar_dsd_short [0.15075810700000147, 0.151888833999692, 0.15750856500017107] inbar_dsd_long [0.16265833400029805, 0.16367860500031384, 0.17333104299905244] pm2r_dsd_py2 [0.1993612549995305, 0.19947306600079173, 0.20446195700060343] elmex_dsd_py3 [0.24682135100010782, 0.24862800600021728, 0.25419495800088043] size: 10000, shared count: 4938, loops: 10 evkounis_dsd [0.2519790539990936, 0.2573451700009173, 0.2603536310016352] raymond_dsd [0.2875208960031159, 0.2887761790007062, 0.30461744100102806] pm2r_dsd_py3 [0.3364586130010139, 0.342166794998775, 0.3465069459998631] inbar_dsd_short [0.3490315640010522, 0.6202766900023562, 0.7155317880024086] inbar_dsd_long [0.42809327600116376, 0.4363977649991284, 0.4812496539998392] pm2r_dsd_py2 [0.46369219400003203, 0.46809901899905526, 0.4706174610000744] elmex_dsd_py3 [0.6603999830003886, 0.6629649060014344, 0.6652154759976838] SHARED: 0.75 size: 10, shared count: 7, loops: 10000 pm2r_dsd_py3 [0.14004066000052262, 0.14024711000092793, 0.1411744200013345] inbar_dsd_long [0.1457400300023437, 0.1463650259975111, 0.17371471199658117] raymond_dsd [0.1495657380000921, 0.15151091000007, 0.1532108950013935] inbar_dsd_short [0.16798981899773935, 0.1684792589985591, 0.17371860500134062] evkounis_dsd [0.18283682300170767, 0.18351536599948304, 0.18536045300061232] pm2r_dsd_py2 [0.24651207700298983, 0.24725952299922938, 0.3011513509991346] elmex_dsd_py3 [0.27965197500088834, 0.2817374969999946, 0.28211258000010275] size: 100, shared count: 83, loops: 1000 evkounis_dsd [0.10071835599956103, 0.10109729699979653, 0.1036734150002303] inbar_dsd_long [0.10147314599817037, 0.1017698140021821, 0.11575333300061175] pm2r_dsd_py2 [0.1257392070001515, 0.14690794800117146, 0.2597000979985751] pm2r_dsd_py3 [0.16547765900031663, 0.17877282599874889, 0.1817621379996126] elmex_dsd_py3 [0.18176361400037422, 0.18339519599976484, 0.18422297999859438] inbar_dsd_short [0.18878075899920077, 0.1932126639985654, 0.201184026998817] raymond_dsd [0.23026226100046188, 0.2342098570006783, 0.24134657600006904] size: 1000, shared count: 751, loops: 100 inbar_dsd_short [0.0925550639985886, 0.09375216300031752, 0.09518678500171518] pm2r_dsd_py3 [0.09365715600142721, 0.0952552939997986, 0.0984138530002383] raymond_dsd [0.10659463599949959, 0.10675223399812239, 0.1076178000002983] inbar_dsd_long [0.10787330499806558, 0.10813268299898482, 0.1191909779990965] evkounis_dsd [0.11020168100003502, 0.11101243599841837, 0.11369209199983743] pm2r_dsd_py2 [0.1283391249999113, 0.12977415000204928, 0.13450328500039177] elmex_dsd_py3 [0.20605224600149086, 0.20856778099914663, 0.21231961700323154] size: 10000, shared count: 7525, loops: 10 evkounis_dsd [0.19238157699874137, 0.19369199399807258, 0.20787687100164476] pm2r_dsd_py3 [0.237352975000249, 0.2393961540001328, 0.24592895499881706] inbar_dsd_short [0.24010049900243757, 0.24383026600116864, 0.246290401002625] inbar_dsd_long [0.31666912799846614, 0.3353785740000603, 0.3762496050003392] raymond_dsd [0.3268343650015595, 0.3270019219999085, 0.32956799900057376] pm2r_dsd_py2 [0.3330148269997153, 0.34052117800092674, 0.3426254549995065] elmex_dsd_py3 [0.6130798710000818, 0.6139247349965444, 0.6146237579996523] SHARED: 0.90 size: 10, shared count: 10, loops: 10000 pm2r_dsd_py3 [0.09191049900255166, 0.09203974899719469, 0.09560386399971321] inbar_dsd_long [0.09304381299807574, 0.09397280899793259, 0.10319281500051147] inbar_dsd_short [0.0980829280015314, 0.09835117700276896, 0.0987546550022671] raymond_dsd [0.14094099900103174, 0.14119526200011023, 0.14634641500015277] evkounis_dsd [0.14480078699853038, 0.1466599049999786, 0.14705315900209825] pm2r_dsd_py2 [0.16137886599972262, 0.16186897499937913, 0.1626489610025601] elmex_dsd_py3 [0.24912584599951515, 0.2519607159993029, 0.2550744569998642] size: 100, shared count: 88, loops: 1000 pm2r_dsd_py3 [0.08017906299937749, 0.08175948099960806, 0.08336899599817116] inbar_dsd_short [0.08394136000060826, 0.08467326000027242, 0.08476182100275764] inbar_dsd_long [0.09241838099842425, 0.0929719669984479, 0.10157853300188435] evkounis_dsd [0.09769711500121048, 0.09770239999852492, 0.10219176600003266] pm2r_dsd_py2 [0.11295593600152642, 0.11317849099941668, 0.11382339899864746] raymond_dsd [0.11950065099881613, 0.11954410699763685, 0.16439275900120265] elmex_dsd_py3 [0.17893833099878975, 0.18027151500064065, 0.18072834000122384] size: 1000, shared count: 896, loops: 100 pm2r_dsd_py3 [0.06560493199867778, 0.06627220900190878, 0.06649829500020132] inbar_dsd_short [0.067232484001579, 0.06832705600027111, 0.06892605100074434] inbar_dsd_long [0.07928322799853049, 0.0793153419981536, 0.0874185499997111] pm2r_dsd_py2 [0.08986150900091161, 0.09258468600091874, 0.09545781900305883] evkounis_dsd [0.09216968399778125, 0.09272978199805948, 0.09716289000061806] raymond_dsd [0.11052805100189289, 0.11131704600120429, 0.11136766299750889] elmex_dsd_py3 [0.18965840600139927, 0.1898866600022302, 0.19107911399987643] size: 10000, shared count: 9011, loops: 10 evkounis_dsd [0.1584843410018948, 0.16192917299849796, 0.16836377900108346] pm2r_dsd_py3 [0.1789340169998468, 0.17990425000243704, 0.1874260629992932] inbar_dsd_short [0.18104806900009862, 0.18631987900153035, 0.18891330599944922] inbar_dsd_long [0.2561770180000167, 0.2672927259991411, 0.27309057399907033] pm2r_dsd_py2 [0.26508888299940736, 0.2661178109992761, 0.2812051930013695] raymond_dsd [0.3262405569985276, 0.32729987999846344, 0.3313657439975941] elmex_dsd_py3 [0.5737760600022739, 0.5791283889993792, 0.5847248999998556] 
+5
source

To get the symmetrical difference between the two dictionaries, use the following robust function:

 def dict_symmetric_difference(a, b): return {k: a[k] if k in a else b[k] for k in # break here to fit without scrolling set(a.keys()).symmetric_difference(b.keys())} 

Logic only:

 {k: a[k] if k in a else b[k] for k in set(a.keys()).symmetric_difference(b.keys())} 

Here is a simpler version of the function to explain:

 def dict_symmetric_difference(a, b): # first make sets of the dictionary keys keys_in_a = set(a.keys()) keys_in_b = set(b.keys()) unique_keys = keys_in_a.symmetric_difference(keys_in_b) # get the unique keys c = {} # start an empty dictionary for key in unique_keys: # iterate over the keys if key in a: # if the key is from a dictionary, take the value from there. c[key] = a[key] else: # the key is in b dictionary, take the value from there. c[key] = b[key] return c 

Explanation of the expression a[k] if k in a else b[k] :

This is a ternary operator that allows me to use it like this: a if condition else b

With this trick, I get the value for the key, regardless of which dictionary it is in.


Using any function:

 >>> dict_symmetric_difference({'a': 1, 'b':2}, {'b':2, 'c':3}) {'a': 1, 'c': 3} 
+8
source

The symmetric difference is equal to the union minus the intersection:

 >>> a = {'a': 1, 'b':2} >>> b = {'b': 2, 'c':3} >>> c = a.copy() >>> c.update(b) >>> for k in (a.keys() & b.keys()): del c[k] >>> c {'a': 1, 'c': 3} 
+3
source

dict.keys() A view object is specified, and supports the ^ symbolric_difference operator.

From the docs:

Key representations are defined because their entries are unique and hashed. [...] For representations with a variety of all operations defined for, base classes of the base class .abc.Set are available (for example, ==, <, or ^).

To deal with the false-ish values ​​that occur with the or expression in the original Inbar Rose solution, we can simply use the in ; test if the key is not in a , it must be in b , so we only need 1 in test.

 def dict_symmetric_difference(a, b): return {k: a[k] if k in a else b[k] for k in a.keys() ^ b.keys()} a = {'a': 1, 'b':2, 'd': ''} b = {'b': 2, 'c':3} print(dict_symmetric_difference(a, b)) 

Output

 {'d': '', 'c': 3, 'a': 1} 

Python 2 has no dictionary view objects, so in Python 2 you need to wrap calls to .keys() set() . Versions of Python prior to version 2.7 do not support dictionary understanding, but you can pass a generator expression to the dict() constructor, unless you are using a truly ancient version of Python.

Here is the version that will work correctly on Python 2.4 +:

 def dict_symmetric_difference(a, b): return dict((k, a[k] if k in a else b[k]) for k in set(a.keys()) ^ set(b.keys())) 

We can avoid the two calls that must be set using the symmetric_difference method instead of the ^ operator, since non-operator versions of various dial operations will take any iteration as an argument. So we can do

 set(a.keys()).symmetric_difference(b.keys()) 

instead

 set(a.keys()) ^ set(b.keys()) 

As Martin Peters noted in the comments, dictionary view objects were backported to Python 2.7 . The syntax is slightly different from Python 3 so as not to break the code that uses the .keys , .values and .items . To get a .viewkeys object, use the .viewkeys method. mydict.viewkeys() much more efficient than set(mydict.keys()) . Dictionary view objects also have the advantage of being dynamic, that is, they reflect any changes made to the dictionary, whereas set(mydict.keys()) should be called again if any changes are made to mydict . This is not a problem for this code, but it is a great feature when you need it.

+3
source

Here's how I do it:

 A = {'a': 1, 'b': 2} B = {'b': 2, 'c': 3} def dict_symmetric_difference(dict_A, dict_B): res = {k:v for k, v in dict_A.items() if k not in dict_B} res.update({k:v for k, v in dict_B.items() if k not in dict_A}) return res print(dict_symmetric_difference(A, B)) # {'a': 1, 'c': 3} 
+1
source

Very short

 A = {'a': 1, 'b': 2} B = {'b': 2, 'c': 3} print dict((k, v) for k, v in A.items() + B.items() if k in set(A) ^ set(B)) 

If you're not comfortable talking about Python's speed and suspicion of doing set(A) ^ set(B) at each iteration, you can simply use this code:

 symm_diff = set(A) ^ set(B) print dict((k, v) for k, v in A.items() + B.items() if k in symm_diff) 
0
source

Source: https://habr.com/ru/post/1265133/


All Articles