I have a program that analyzes the sequence of alleles. I am trying to write code that determines whether an allele is complete or not. To do this, I need to count the number of gaps in the control sequence. A gap is indicated by the string '-'. If there is more than one gap, I want the program to say "Incomplete allele."
How can I figure out how to count the number of gaps in a sequence?
Here is an example of a broken sequence:
>DQB1*04:02:01
--ATGTCTTGGAAGAAGGCTTTGCGGAT-------CCCTGGAGGCCTTCGGGTAGCAACT
GTGACCTT----GATGCTGGCGATGCTGAGCACCCCGGTGGCTGAGGGCAGAGACTCTCC
CGAGGATTTCGTGTTCCAGTTTAAGGGCATGTGCTACTTCACCAACGGGACCGAGCGCGT
GCGGGGTGTGACCAGATACATCTATAACCGAGAGGAGTACGCGCGCTTCGACAGCGACGT
GGGGGTGTATCGGGCGGTGACGCCGCTGGGGCGGCTTGACGCCGAGTACTGGAATAGCCA
GAAGGACATCCTGGAGGAGGACCGGGCGTCGGTGGACACCGTATGCAGACACAACTACCA
GTTGGAGCTCCGCACGACCTTGCAGCGGCGA-----------------------------
CTGCTGGTCTGCTCAGTGACAGATTTCTATCCAGCCCAGATCAAAGTCCGGTGGTTTCGG
AATGACCAGGAGGAGACAACTGGCGTTGTGTCCACCCCCCTTATTAGGAACGGTGACTGG
ACCTTCCAGATCCTGGTGATGCTGGAAATGACTCCCCAGCGTGGAGACGTCTACACCTGC
CACGTGGAGCACCCCAGCCTCCAGAACCCCATCATCGTGGAGTGGCGGGCTCAGTCTGAA
TCTGCCCAGAGCAAGATGCTGAGTGG----CATTGGAGGCTTCGTGCTGGGGCTGATCTT
CCTCGGGCTGGGCCTTATTATC--------------CATCACAGGAGTCAGAAAGGGCTC
CTGCACTGA---------------------------------------------------
The code I still have is as follows:
idx=[]
for m in range(len(sequence)):
for n in re.finditer('-',sequence[0]):
idx.append(n.start())
counter=0
min_val=[]
for n in range(len(idx)):
if counter==idx[n]:
counter=counter+1
elif counter !=0:
min_val.append(idx[n-1])
counter=0
, "-", , . , , .