Python: "except KeyError" faster than "if key in dict"?

Edit 2: It has been suggested that this is a copy of a similar question. I would not agree, as my question focuses on speed, while another question asks which is more “readable” or “better” (better without definition). Although the questions are similar, there is a big difference in the discussion / answers.

EDIT: I understand from the questions that I could be more clear. Sorry for the typos, yes, it must use the correct python operator to add.

As for the input, I just chose a list of random numbers, as this is a common example. In my case, I use dict, where I expect a lot of keys, maybe 95% of the keys will not exist, and those that exist will contain data clusters.

I am interested in the general discussion, despite the fact that regardless of the set of input data, but, of course, examples with working time are interesting.

My standard approach would be like so many other posts to write something like

list = (100 random numbers) d = {} for x in list: if x in d: d[x]+=1 else: d[x]=1 

But I just thought about it faster, since we do not need to check whether the dictionary contains a key. We just assume that it is, and if not, we can handle it. Is there a difference or is Python smarter than me?

 list = (100 random numbers) d = {} for x in list: try: d[x]+=1 except KeyError: d[x] = 1 

The same approach with indices in an array, out of bounds, negative indices, etc.

+10
performance python dictionary exception counter keyerror
Dec 01 '13 at 3:57
source share
5 answers

Your statement is absolutely incorrect depending on the input.

If you have a diverse set of keys and often hits the except block, performance is not very good. If the try block dominates, then the try/except idiom can be executed on smaller lists.

Here is an example showing several ways to do the same thing:

 from __future__ import print_function import timeit import random import collections def f1(): d={} for x in tgt: if x in d: d[x]+=1 else: d[x]=1 return d def f2(): d = {} for x in tgt: try: d[x]+=1 except KeyError: d[x] = 1 return d def f3(): d={}.fromkeys(tgt, 0) for x in tgt: d[x]+=1 return d def f4(): d=collections.defaultdict(int) for x in tgt: d[x]+=1 return d def f5(): return collections.Counter(tgt) def f6(): d={} for x in tgt: d[x]=d.setdefault(x, 0)+1 return d def f7(): d={} for x in tgt: d[x]=d.get(x,0)+1 return d def cmpthese(funcs, c=10000, rate=True, micro=False): """Generate a Perl style function benchmark""" def pprint_table(table): """Perl style table output""" def format_field(field, fmt='{:,.0f}'): if type(field) is str: return field if type(field) is tuple: return field[1].format(field[0]) return fmt.format(field) def get_max_col_w(table, index): return max([len(format_field(row[index])) for row in table]) col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))] for i,row in enumerate(table): # left col row_tab=[row[0].ljust(col_paddings[0])] # rest of the cols row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))] print(' '.join(row_tab)) results={k.__name__:timeit.Timer(k).timeit(c) for k in funcs} fastest=sorted(results,key=results.get, reverse=True) table=[['']] if rate: table[0].append('rate/sec') if micro: table[0].append('usec/pass') table[0].extend(fastest) for e in fastest: tmp=[e] if rate: tmp.append('{:,}'.format(int(round(float(c)/results[e])))) if micro: tmp.append('{:.3f}'.format(1000000*results[e]/float(c))) for x in fastest: if x==e: tmp.append('--') else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e])) table.append(tmp) pprint_table(table) if __name__=='__main__': import sys print(sys.version) for j in [100,1000]: for t in [(0,5), (0,50), (0,500)]: tgt=[random.randint(*t) for i in range(j)] print('{} rand ints between {}:'.format(j,t)) print('=====') cmpthese([f1,f2,f3,f4,f5,f6,f7]) print() 

I have included a small timeit based timeit function that prints functions from Slowest to Fastest with a percentage difference between them.

Here are the results for Python 3:

 3.4.1 (default, May 19 2014, 13:10:29) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] 100 rand ints between (0, 5): ===== rate/sec f6 f7 f1 f2 f3 f4 f5 f6 52,756 -- -1.6% -26.2% -27.9% -30.7% -36.7% -46.8% f7 53,624 1.6% -- -25.0% -26.7% -29.6% -35.7% -46.0% f1 71,491 35.5% 33.3% -- -2.3% -6.1% -14.2% -28.0% f2 73,164 38.7% 36.4% 2.3% -- -3.9% -12.2% -26.3% f3 76,148 44.3% 42.0% 6.5% 4.1% -- -8.7% -23.3% f4 83,368 58.0% 55.5% 16.6% 13.9% 9.5% -- -16.0% f5 99,247 88.1% 85.1% 38.8% 35.6% 30.3% 19.0% -- 100 rand ints between (0, 50): ===== rate/sec f2 f6 f7 f4 f3 f1 f5 f2 39,405 -- -17.9% -18.7% -19.1% -41.8% -47.8% -56.3% f6 47,980 21.8% -- -1.1% -1.6% -29.1% -36.5% -46.8% f7 48,491 23.1% 1.1% -- -0.5% -28.4% -35.8% -46.2% f4 48,737 23.7% 1.6% 0.5% -- -28.0% -35.5% -46.0% f3 67,678 71.7% 41.1% 39.6% 38.9% -- -10.4% -24.9% f1 75,511 91.6% 57.4% 55.7% 54.9% 11.6% -- -16.3% f5 90,175 128.8% 87.9% 86.0% 85.0% 33.2% 19.4% -- 100 rand ints between (0, 500): ===== rate/sec f2 f4 f6 f7 f3 f1 f5 f2 25,748 -- -22.0% -41.4% -42.6% -57.5% -66.2% -67.8% f4 32,996 28.1% -- -24.9% -26.4% -45.6% -56.7% -58.8% f6 43,930 70.6% 33.1% -- -2.0% -27.5% -42.4% -45.1% f7 44,823 74.1% 35.8% 2.0% -- -26.1% -41.2% -44.0% f3 60,624 135.5% 83.7% 38.0% 35.3% -- -20.5% -24.2% f1 76,244 196.1% 131.1% 73.6% 70.1% 25.8% -- -4.7% f5 80,026 210.8% 142.5% 82.2% 78.5% 32.0% 5.0% -- 1000 rand ints between (0, 5): ===== rate/sec f7 f6 f1 f3 f2 f4 f5 f7 4,993 -- -6.7% -34.6% -39.4% -44.4% -50.1% -71.1% f6 5,353 7.2% -- -29.9% -35.0% -40.4% -46.5% -69.0% f1 7,640 53.0% 42.7% -- -7.3% -14.9% -23.6% -55.8% f3 8,242 65.1% 54.0% 7.9% -- -8.2% -17.6% -52.3% f2 8,982 79.9% 67.8% 17.6% 9.0% -- -10.2% -48.1% f4 10,004 100.4% 86.9% 30.9% 21.4% 11.4% -- -42.1% f5 17,293 246.4% 223.0% 126.3% 109.8% 92.5% 72.9% -- 1000 rand ints between (0, 50): ===== rate/sec f7 f6 f1 f2 f3 f4 f5 f7 5,051 -- -7.1% -26.5% -29.0% -34.1% -45.7% -71.2% f6 5,435 7.6% -- -20.9% -23.6% -29.1% -41.5% -69.0% f1 6,873 36.1% 26.5% -- -3.4% -10.3% -26.1% -60.8% f2 7,118 40.9% 31.0% 3.6% -- -7.1% -23.4% -59.4% f3 7,661 51.7% 41.0% 11.5% 7.6% -- -17.6% -56.3% f4 9,297 84.0% 71.1% 35.3% 30.6% 21.3% -- -47.0% f5 17,531 247.1% 222.6% 155.1% 146.3% 128.8% 88.6% -- 1000 rand ints between (0, 500): ===== rate/sec f2 f4 f6 f7 f3 f1 f5 f2 3,985 -- -11.0% -13.6% -14.8% -25.7% -40.4% -66.9% f4 4,479 12.4% -- -2.9% -4.3% -16.5% -33.0% -62.8% f6 4,613 15.8% 3.0% -- -1.4% -14.0% -31.0% -61.6% f7 4,680 17.4% 4.5% 1.4% -- -12.7% -30.0% -61.1% f3 5,361 34.5% 19.7% 16.2% 14.6% -- -19.8% -55.4% f1 6,683 67.7% 49.2% 44.9% 42.8% 24.6% -- -44.4% f5 12,028 201.8% 168.6% 160.7% 157.0% 124.3% 80.0% -- 

And Python 2:

 2.7.6 (default, Dec 1 2013, 13:26:15) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] 100 rand ints between (0, 5): ===== rate/sec f5 f7 f6 f2 f1 f3 f4 f5 24,955 -- -41.8% -42.5% -51.3% -55.7% -61.6% -65.2% f7 42,867 71.8% -- -1.2% -16.4% -23.9% -34.0% -40.2% f6 43,382 73.8% 1.2% -- -15.4% -23.0% -33.2% -39.5% f2 51,293 105.5% 19.7% 18.2% -- -9.0% -21.0% -28.5% f1 56,357 125.8% 31.5% 29.9% 9.9% -- -13.2% -21.4% f3 64,924 160.2% 51.5% 49.7% 26.6% 15.2% -- -9.5% f4 71,709 187.3% 67.3% 65.3% 39.8% 27.2% 10.5% -- 100 rand ints between (0, 50): ===== rate/sec f2 f5 f7 f6 f4 f3 f1 f2 22,439 -- -4.7% -45.1% -45.5% -50.7% -63.3% -64.5% f5 23,553 5.0% -- -42.4% -42.8% -48.3% -61.5% -62.8% f7 40,878 82.2% 73.6% -- -0.7% -10.2% -33.2% -35.4% f6 41,164 83.4% 74.8% 0.7% -- -9.6% -32.7% -34.9% f4 45,525 102.9% 93.3% 11.4% 10.6% -- -25.6% -28.0% f3 61,167 172.6% 159.7% 49.6% 48.6% 34.4% -- -3.3% f1 63,261 181.9% 168.6% 54.8% 53.7% 39.0% 3.4% -- 100 rand ints between (0, 500): ===== rate/sec f2 f5 f4 f6 f7 f3 f1 f2 13,122 -- -39.9% -56.2% -63.2% -63.8% -75.8% -80.0% f5 21,837 66.4% -- -27.1% -38.7% -39.8% -59.6% -66.7% f4 29,945 128.2% 37.1% -- -16.0% -17.4% -44.7% -54.3% f6 35,633 171.6% 63.2% 19.0% -- -1.7% -34.2% -45.7% f7 36,257 176.3% 66.0% 21.1% 1.8% -- -33.0% -44.7% f3 54,113 312.4% 147.8% 80.7% 51.9% 49.2% -- -17.5% f1 65,570 399.7% 200.3% 119.0% 84.0% 80.8% 21.2% -- 1000 rand ints between (0, 5): ===== rate/sec f5 f7 f6 f1 f2 f3 f4 f5 2,787 -- -37.7% -38.4% -53.3% -59.9% -60.4% -67.0% f7 4,477 60.6% -- -1.1% -25.0% -35.6% -36.3% -47.0% f6 4,524 62.3% 1.1% -- -24.2% -34.9% -35.6% -46.5% f1 5,972 114.3% 33.4% 32.0% -- -14.1% -15.0% -29.3% f2 6,953 149.5% 55.3% 53.7% 16.4% -- -1.1% -17.7% f3 7,030 152.2% 57.0% 55.4% 17.7% 1.1% -- -16.8% f4 8,452 203.3% 88.8% 86.8% 41.5% 21.6% 20.2% -- 1000 rand ints between (0, 50): ===== rate/sec f5 f7 f6 f2 f1 f3 f4 f5 2,667 -- -37.8% -38.7% -53.0% -55.9% -61.1% -65.3% f7 4,286 60.7% -- -1.5% -24.5% -29.1% -37.5% -44.2% f6 4,351 63.1% 1.5% -- -23.4% -28.0% -36.6% -43.4% f2 5,677 112.8% 32.4% 30.5% -- -6.1% -17.3% -26.1% f1 6,045 126.6% 41.0% 39.0% 6.5% -- -11.9% -21.4% f3 6,862 157.3% 60.1% 57.7% 20.9% 13.5% -- -10.7% f4 7,687 188.2% 79.3% 76.7% 35.4% 27.2% 12.0% -- 1000 rand ints between (0, 500): ===== rate/sec f2 f5 f7 f6 f4 f3 f1 f2 2,018 -- -16.1% -44.1% -46.2% -53.4% -61.8% -63.0% f5 2,405 19.1% -- -33.4% -35.9% -44.5% -54.4% -55.9% f7 3,609 78.8% 50.1% -- -3.8% -16.7% -31.6% -33.8% f6 3,753 85.9% 56.1% 4.0% -- -13.4% -28.9% -31.2% f4 4,334 114.7% 80.2% 20.1% 15.5% -- -17.9% -20.5% f3 5,277 161.5% 119.5% 46.2% 40.6% 21.8% -- -3.2% f1 5,454 170.2% 126.8% 51.1% 45.3% 25.8% 3.3% -- 

So - it depends.

Findings:

  • The Counter method is almost always among the slowest
  • The Counter method is one of the slowest in Python 2, but by far the fastest in Python 3.4
  • The try/except version is usually among the slowest
  • The if key in dict version is expected to be one of the best / fastest, regardless of the size or number of keys
  • {}.fromkeys(tgt, 0) very predictable
  • The defaultdict version defaultdict the fastest on large lists. In smaller lists, longer tuning times are amortized by too many elements.
+19
Dec 01 '13 at 4:12
source share

There is another point when it comes to coding style. Like the usual python coding style for using EAFP (easier to apologize than for permission), which assumes valid keys and throws exceptions if the assumption is false.

In connection with this general coding style, I always used the try / except approach and was sure that it was faster than LBYL (look before jumping). As I found out from the answers here, it definitely depends. As long as you can expect an existing key, I would go for a try / except option.

+1
Oct. 31 '16 at 12:42 on
source share

NOTE: purely speculative

I think the first one will be slower because it finds the key twice in the dictionary, first in the if statement, then in the C code to access the dictionary. Try-except can be slower when many of the keys are not in the dictionary, since exception handling involves some overhead.

I set the range(100) list and left the dictionary empty. The first use of if takes 8.003 seconds, and the second using try-except takes 30.976 seconds! The overhead is quite significant when nothing more is done.

0
Dec 01 '13 at 4:03
source share

Update: Not sure what I tested more, but still found interesting results.

Python 2:

 0% missing keys, Standard access: 0.419198036194 0% missing keys, try/except access: 0.309811115265 50% missing keys, Standard access: 0.417014837265 50% missing keys, try/except access: 0.309100866318 100% missing keys, Standard access: 0.416236877441 100% missing keys, try/except access: 0.310797929764 

I tested 3 dictionaries with a different number of keys using the regular and try / except method. The try / except method was faster for me every time.

My code is:

 from timeit import timeit size = 2**10 allkeys = "0% missing keys", dict([(i, 0) for i in range(size)]) somekeys= "50% missing keys", dict([(i*2, 0) for i in range(size//2)]) nokeys = "100% missing keys", dict([]) def test_normal(): """Standard access""" for i in xrange(size): if i in d: d[i] += 1 else: d[i] = 1 def test_try(): """try/except access""" for i in xrange(size): try: d[i] += 1 except KeyError: d[i] = 1 for trial in (allkeys, somekeys, nokeys): d = trial[1] for test in (test_normal, test_try): trial_time = timeit("test()", setup="from __main__ import test", number=2**10) print "{0}, {1}: {2}".format(trial[0], test.__doc__, trial_time) 

Now the code uses timeit, which is probably more accurate.

0
Dec 01 '13 at 4:20
source share
 import random from pip._vendor.distlib.compat import raw_input x=random.randint(1,99) guess = int(raw_input("Enter a integer from 1 to 99:")) while x !="guess": print if guess<x: print ("guess is low") guess= int(raw_input("Enter a integer from 1 to 99:")) elif guess >x: print ("guess is high") guess = int(raw_input("Enter a integer from 1 to 99:")) else: print (" you guessed it !") break print 
-2
Aug 20 '17 at 17:34 on
source share



All Articles