Is there a faster way to get multiple keys from a dictionary?

Question

Is there a faster way to get multiple keys from a dictionary?

I have a dictionary:

d = {'a':1, 'b':2, 'c':3, 'd':4}

Then I have a list of keys:

 l = ['a', 'b', 'z']

My desired result:

 [1, 2, None]

What am I doing so far:

 [d.get(k) for k in l]

Is there a faster way? Perhaps without for ?

+5

performance python dictionary key-value

Ezer k Mar 29 '18 at 12:14

source share

3 answers

It is important to note that here the neck of the bottle is not the size of the dictionary, but the size of the list of keys (and, of course, the time it takes to search for the method, the time it takes for the hash key to be raised, etc.)

Consider the following:

 from timeit import Timer d = {1: 'a'} keys = [1 for _ in range(1000)] def _map(): return list(map(d.get, keys)) def _map_single_lookup(): g = d.get return list(map(g, keys)) def _list_comp(): return [d.get(key) for key in keys] def _list_comp_single_lookup(): g = d.get return [g(key) for key in keys] print(min(Timer(_map).repeat(100, 100))) print(min(Timer(_map_single_lookup).repeat(100, 100))) print(min(Timer(_list_comp).repeat(100, 100))) print(min(Timer(_list_comp_single_lookup).repeat(100, 100))) # 0.009307396466818774 # 0.009261678214412816 # 0.018456645101335933 # 0.011634828724497837

+4

Deep space Mar 29 '18 at 12:24

source share

A much faster alternative would be to use itemgetter with defaultdict (because itemgetter does not support the default value, for example dict.get if the key does not exist)

 from collections import defaultdict from timeit import Timer from operator import itemgetter d = defaultdict(lambda: None) d[1] = 'a' keys = [1 for _ in range(1000)] def _map(): return list(map(d.get, keys)) def _getter(): return list(itemgetter(*keys)(d)) print(min(Timer(_map).repeat(100, 100))) print(min(Timer(_getter).repeat(100, 100))) # 0.0074976040767260055 # 0.0021861597102568187

Edit Added timings for nonexistent keys (integers and strings). It does not significantly affect performance.

 from collections import defaultdict from timeit import Timer from operator import itemgetter d = defaultdict(lambda: None) d[1] = 'a' non_existing_keys_int = [2 for _ in range(1000)] non_existing_keys_string = ['a' for _ in range(1000)] def get_non_existing_keys_int(): return list(itemgetter(*non_existing_keys_int)(d)) def get_non_existing_keys_string(): return list(itemgetter(*non_existing_keys_string)(d)) print(min(Timer(get_non_existing_keys_int).repeat(100, 100))) print(min(Timer(get_non_existing_keys_string).repeat(100, 100))) # 0.002647169132724954 # 0.002408539023506795

+2

Deep space Mar 29 '18 at 15:33

source share

Mseifert · Accepted Answer · 2018-03-29T12:18:16+0000

You can use:

 >>> list(map(d.get, l)) [1, 2, None]

It has two advantages:

It searches d.get only once - not every iteration
Only CPython: since dict.get implemented in C and map implemented in C, it can avoid the Python layer in the function call (roughly speaking, the details are a bit more complicated).

As for the timings (performed on Python 3.6 in a Jupyter laptop):

 d = {'a':1, 'b':2, 'c':3, 'd':4} l = ['a', 'b', 'z'] %timeit list(map(d.get, l)) 594 ns ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit [d.get(k) for k in l] 508 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Please note that in this case it is actually slower! This is because the map and list add-ons dominate for short iterations. Therefore, if you want it to be faster in short iterations, stick to your approach.

With a longer l you will see that list(map(...)) will become faster over time:

 d = {'a':1, 'b':2, 'c':3, 'd':4} l = [random.choice(string.ascii_lowercase) for _ in range(10000)] %timeit list(map(d.get, l)) 663 µs ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit [d.get(k) for k in l] 1.13 ms ± 7.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

However, it is still "just" 2 times faster.

Is there a faster way to get multiple keys from a dictionary?

More articles: