blosum62 is a query of 276 elements.
I prefer to fill in the missing elements because it represents an iteration of only 276 revolutions, while the analyzed sequences are likely to contain more than 276 elements. Therefore, if you find the score of each pair using the score_match () function, this function will have to run an if pair not in matrix test for each element of the sequences, that is, of course, much more than 276 times.
Another thing that takes a lot of time: each score += something creates a new integer and associates the name rating with this new object. Each binding takes a certain amount of time that does not exist with a stream of integers by a generator, which are instantly added to the current amount.
from Bio.SubsMat.MatrixInfo import blosum62 as blosum from itertools import izip blosum.update(((b,a),val) for (a,b),val in blosum.items()) def score_pairwise(seq1, seq2, matrix, gap_s, gap_e, gap = True): for A,B in izip(seq1, seq2): diag = ('-'==A) or ('-'==B) yield (gap_e if gap else gap_s) if diag else matrix[(A,B)] gap = diag seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE' seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE' print sum(score_pairwise(seq1, seq2, blosum, -5, -1))
This function score_pairwise () is a generator function since yield returns .
source share