Oh, you just think how MUCH matches with distance from interference <2? This can be done much faster.
total_count = 0 for line in f:
Note that most of your code time will be spent on a line:
if hamming(line, ll) <= 2:
Thus, in any way that you can improve this algorithm, the LAST improvements are your overall script speed. Bude's answer praises the virtues of the jellyfish hamming_distance function, but without any personal experience, I cannot recommend it myself. However, his advice to use a faster implementation of distance for interference is sound!
Peter DeGlopper suggests expanding the l list into six different sets of matches of “Two or less hits.” That is, a group of sets that contains all possible pairs that can have two or less distances from hamming. It might look like this:
# hamming_sets is [ {AB??}, {A?C?}, {A??D}, {?BC?}, {?B?D}, {??CD} ] hamming_sets = [ set(), set(), set(), set(), set(), set() ] for ll in l: # this should take the lion share of time in your program hamming_sets[0].add(l[0] + l[1]) hamming_sets[0].add(l[0] + l[2]) hamming_sets[0].add(l[0] + l[3]) hamming_sets[0].add(l[1] + l[2]) hamming_sets[0].add(l[1] + l[3]) hamming_sets[0].add(l[2] + l[3]) total_count = 0 for line in f: # and this should be fast, even if `f` is large line = line.strip() if line[0]+line[1] in hamming_sets[0] or \ line[0]+line[2] in hamming_sets[1] or \ line[0]+line[3] in hamming_sets[2] or \ line[1]+line[2] in hamming_sets[3] or \ line[1]+line[3] in hamming_sets[4] or \ line[2]+line[3] in hamming_sets[5]: total_count += 1
You could get readability by making a hamming_sets dictionary of transform_function: set_of_results key value pairs.
hamming_sets = {lambda s: s[0]+s[1]: set(), lambda s: s[0]+s[2]: set(), lambda s: s[0]+s[3]: set(), lambda s: s[1]+s[2]: set(), lambda s: s[1]+s[3]: set(), lambda s: s[2]+s[3]: set()} for func, set_ in hamming_sets.items(): for ll in l: set_.add(func(ll)) total_count = 0 for line in f: line = line.strip() if any(func(line) in set_ for func, set_ in hamming_sets.items()): total_count += 1