Is there a faster way to get multiple keys from a dictionary?

I have a dictionary:

d = {'a':1, 'b':2, 'c':3, 'd':4} 

Then I have a list of keys:

 l = ['a', 'b', 'z'] 

My desired result:

 [1, 2, None] 

What am I doing so far:

 [d.get(k) for k in l] 

Is there a faster way? Perhaps without for ?

+5
source share
3 answers

You can use:

 >>> list(map(d.get, l)) [1, 2, None] 

It has two advantages:

  • It searches d.get only once - not every iteration
  • Only CPython: since dict.get implemented in C and map implemented in C, it can avoid the Python layer in the function call (roughly speaking, the details are a bit more complicated).

As for the timings (performed on Python 3.6 in a Jupyter laptop):

 d = {'a':1, 'b':2, 'c':3, 'd':4} l = ['a', 'b', 'z'] %timeit list(map(d.get, l)) 594 ns ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit [d.get(k) for k in l] 508 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 

Please note that in this case it is actually slower! This is because the map and list add-ons dominate for short iterations. Therefore, if you want it to be faster in short iterations, stick to your approach.

With a longer l you will see that list(map(...)) will become faster over time:

 d = {'a':1, 'b':2, 'c':3, 'd':4} l = [random.choice(string.ascii_lowercase) for _ in range(10000)] %timeit list(map(d.get, l)) 663 µs ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit [d.get(k) for k in l] 1.13 ms ± 7.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

However, it is still "just" 2 times faster.

+12
source

It is important to note that here the neck of the bottle is not the size of the dictionary, but the size of the list of keys (and, of course, the time it takes to search for the method, the time it takes for the hash key to be raised, etc.)

Consider the following:

 from timeit import Timer d = {1: 'a'} keys = [1 for _ in range(1000)] def _map(): return list(map(d.get, keys)) def _map_single_lookup(): g = d.get return list(map(g, keys)) def _list_comp(): return [d.get(key) for key in keys] def _list_comp_single_lookup(): g = d.get return [g(key) for key in keys] print(min(Timer(_map).repeat(100, 100))) print(min(Timer(_map_single_lookup).repeat(100, 100))) print(min(Timer(_list_comp).repeat(100, 100))) print(min(Timer(_list_comp_single_lookup).repeat(100, 100))) # 0.009307396466818774 # 0.009261678214412816 # 0.018456645101335933 # 0.011634828724497837 
+4
source

A much faster alternative would be to use itemgetter with defaultdict (because itemgetter does not support the default value, for example dict.get if the key does not exist)

 from collections import defaultdict from timeit import Timer from operator import itemgetter d = defaultdict(lambda: None) d[1] = 'a' keys = [1 for _ in range(1000)] def _map(): return list(map(d.get, keys)) def _getter(): return list(itemgetter(*keys)(d)) print(min(Timer(_map).repeat(100, 100))) print(min(Timer(_getter).repeat(100, 100))) # 0.0074976040767260055 # 0.0021861597102568187 

Edit Added timings for nonexistent keys (integers and strings). It does not significantly affect performance.

 from collections import defaultdict from timeit import Timer from operator import itemgetter d = defaultdict(lambda: None) d[1] = 'a' non_existing_keys_int = [2 for _ in range(1000)] non_existing_keys_string = ['a' for _ in range(1000)] def get_non_existing_keys_int(): return list(itemgetter(*non_existing_keys_int)(d)) def get_non_existing_keys_string(): return list(itemgetter(*non_existing_keys_string)(d)) print(min(Timer(get_non_existing_keys_int).repeat(100, 100))) print(min(Timer(get_non_existing_keys_string).repeat(100, 100))) # 0.002647169132724954 # 0.002408539023506795 
+2
source

Source: https://habr.com/ru/post/1276158/


All Articles