I'm relatively new to Python and wondered if there is any reason to prefer one of these methods over another when deleting elements from dict?
A) Using del
if k in d:
del k
B) Using pop
d.pop(k, None)
My first thought was that for approach (A) it is necessary to perform two searches - once in the instruction ifand again in the implementation del, which would make it a little slower than popthat you need only one view. Then a colleague pointed out that he delcould still have an advantage, because it is a keyword, and therefore could potentially be better optimized, whereas it popis a method that can be replaced by end users (not sure if this is really a factor, but it has much more experience writing Python code).
I wrote some test snippets to compare performance. It seems like it delhas an edge (I added fragments if someone wants to try them or comment on the correctness).
, : , ?
:
import timeit
print 'in: ', timeit.Timer(stmt='42 in d', setup='d = dict.fromkeys(range(100000))').timeit()
print 'pop: ', timeit.Timer(stmt='d.pop(42,None)', setup='d = dict.fromkeys(range(100000))').timeit()
print 'del: ', timeit.Timer(stmt='if 42 in d:\n del d[42]', setup='d = dict.fromkeys(range(100000))').timeit()
in: 0.0521960258484
pop: 0.172810077667
del: 0.0660231113434
, . , pop in, . , del , in, , timeit , del if.
, , . timeit , if del ( ):
import timeit
repeat_num=100
number=1000
small_size=10000
large_size=1000000
collect_garbage = False
setup_stmt = """
import random
d = dict.fromkeys(range(%(dict_size)i))
# key, randomly chosen
k = random.randint(0,%(dict_size)i - 1)
%(garbage)s
"""
in_stmt = """
k in d
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
pop_stmt = """
d.pop(k, None)
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
del_stmt = """
if k in d:
del d[k]
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
print \
"""SETUP:
repeats : %(repeats)s
runs per repeat: %(number)s
garbage collect: %(garbage)s""" \
% {'repeats' : repeat_num,
'number' : number,
'garbage' : 'yes' if collect_garbage else 'no'}
print "SMALL:"
small_setup_stmt = setup_stmt % \
{'dict_size' : small_size,
'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " in: ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " del: ", sum(times)/len(times)
print "LARGE:"
large_setup_stmt = setup_stmt % \
{'dict_size' : large_size,
'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " in: ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " del: ", sum(times)/len(times)
100 , 1000 , :
SETUP:
repeats : 100
runs per repeat: 1000
garbage collect: no
SMALL:
in: 0.00020430803299
pop: 0.000313355922699
del: 0.000262062549591
LARGE:
in: 0.000201721191406
pop: 0.000328607559204
del: 0.00027587890625
timeit, , , , , , del .
, , , Python -, , ++ std::map, ( vs O (log (n)) - ish). . .