Convert codons (base 64) to a base number of 10

The July 2012 issue of Mensa Newsletter has an article called Digital Brain. In it, the author connects the human brain with basic calculations. This is a pretty interesting and funny article with a tip at the end. This invitation asks the reader to convert the Cytosine Guanine Adenine Guanine Adenine Guanine to a base number of 10, using the fact that the cytosine cytosine Guanine Cytosine Adenine Guanine is 2011 (the first set of codons is short and the second is ccgcag for short.) you need to convert base number 64 to base 10 using the table in the article, which displays all possible codons in the correct order with aug = 0, uuu = 1, uuc = 2, ..., gga == 61, ggg = 62, uag = 63. I decided to do this and decided to write a python program to convert codon numbers to base 10 and base 10 numbers codons. After writing a quick algorithm for both, I ran it. The program did not give any errors and made codons for my numbers and vice versa. However, they were wrong numbers! I don’t seem to see what is going wrong, and I really appreciate any help.

Without any noise code:

codons = ['aug', 'uuu', 'uuc', 'uua', 'uug', 'ucu', 'ucc', 'uca', 'ucg', 'uau', 'uac', 'uaa', 'ugu', 'ugc', 'uga', 'ugg', 'cuu', 'cuc', 'cua', 'cug', 'ccu', 'ccc', 'cca', 'ccg', 'cau', 'cac', 'caa', 'cag', 'cgu', 'cgc', 'cga', 'cgg', 'auu', 'auc', 'aua', 'acu', 'acc', 'aca', 'acg', 'aau', 'aac', 'aaa', 'aag', 'agu', 'agc', 'aga', 'agg', 'guu', 'guc', 'gua', 'gug', 'gcu', 'gcc', 'gca', 'gcg', 'gau', 'gac', 'gaa', 'gag', 'ggu', 'ggc', 'gga', 'ggg', 'uag' ] def codonNumToBase10 ( codonValue ) : numberOfChars = len( codonValue ) # check to see if contains sets of threes if len( codonValue ) % 3 != 0 : return -1 # check to see if it contains the correct characters for i in range(0, numberOfChars ) : if codonValue[i] != 'a' : if codonValue[i] != 'u' : if codonValue[i] != 'c' : if codonValue[i] != 'g' : return -2 # populate an array with decimal versions of each codon in the input codonNumbers = [] base10Value = 0 numberOfCodons = int(numberOfChars / 3 ) for i in range(0, numberOfCodons) : charVal = codonValue[ 0 + (i*3) ] + codonValue[ 1 + (i*3) ] + codonValue[ 2 + (i*3) ] val = 0 for j in codons : if j == charVal : codonNumbers.append( val ) break val += 1 base10Value += ( pow( 64, numberOfCodons - i - 1 ) ) * codonNumbers[i] return base10Value def base10ToCodonNum ( number ) : codonNumber = '' hitZeroCount = 0 while( 1==1 ) : val = number % 64 number = int( number / 64 ) codonNumber = codons[val] + codonNumber if number == 0 : if hitZeroCount > 0: break hitZeroCount += 1 return codonNumber val_2011 = 'ccgcag' val_unknown = 'cgagag' print( base10ToCodonNum( codonNumToBase10( val_2011 ) ), '::', codonNumToBase10( val_2011 ) ) print( base10ToCodonNum( codonNumToBase10( val_unknown ) ), '::', codonNumToBase10( val_unknown ) ) 

EDIT 1: The values ​​I get are 1499 for ccgcag and 1978 for cgagag.

EDIT 2: base10ToCodonNum function fixed thanks to Ashwini Chaudhary.

+6
source share
3 answers

Your code really converts to and from base-64. I suspect that you did not identify the codons in the same order as in this problem.

With the order you provided for the codons:

'ccgcag' = codons.index('ccg') * 64 + codons.index('cag') = 23 * 64 + 27 = 1499

Which is mathematically correct, with the replacement you provided. To get 2011, you have to put cggcag - so, are you sure you copied them in exactly the same order?

+1
source

I could not follow your code, so I made another implementation, but got the same results:

 CODONS = [ 'aug', 'uuu', 'uuc', 'uua', 'uug', 'ucu', 'ucc', 'uca', 'ucg', 'uau', 'uac', 'uaa', 'ugu', 'ugc', 'uga', 'ugg', 'uuu', 'cuc', 'cua', 'cug', 'ccu', 'ccc', 'cca', 'ccg', 'cau', 'cac', 'caa', 'cag', 'cgu', 'cgc', 'cga', 'cgg', 'auu', 'auc', 'aua', 'acu', 'acc', 'aca', 'acg', 'aau', 'aac', 'aaa', 'aag', 'agu', 'agc', 'aga', 'agg', 'guu', 'guc', 'gua', 'gug', 'gcu', 'gcc', 'gca', 'gcg', 'gau', 'gac', 'gaa', 'gag', 'ggu', 'ggc', 'gga', 'ggg', 'uag', ] def codon2decimal(s): if len(s) % 3 != 0: raise ValueError("%s doesn't look like a codon number." % s) digits = reversed([ s[i*3:i*3+3] for i in range(len(s)/3) ]) val = 0 for i, digit in enumerate(digits): if digit not in CODONS: raise ValueError("invalid sequence: %s." % digit) val += CODONS.index(digit) * 64 ** i return val def main(): for number in ('cggcag', 'ccgcag', 'cgagag', 'auguuuuuc'): print number, ':', codon2decimal(number) if __name__ == '__main__': main() 

results:

 cggcag : 2011 ccgcag : 1499 cgagag : 1978 auguuuuuc : 66 
+2
source
 def codon2dec(x): codons = ['aug', 'uuu', 'uuc', 'uua', 'uug', 'ucu', 'ucc', 'uca', 'ucg', 'uau', 'uac', 'uaa', 'ugu', 'ugc', 'uga', 'ugg', 'uuu', 'cuc', 'cua', 'cug', 'ccu', 'ccc', 'cca', 'ccg', 'cau', 'cac', 'caa', 'cag', 'cgu', 'cgc', 'cga', 'cgg', 'auu', 'auc', 'aua', 'acu', 'acc', 'aca', 'acg', 'aau', 'aac', 'aaa', 'aag', 'agu', 'agc', 'aga', 'agg', 'guu', 'guc', 'gua', 'gug', 'gcu', 'gcc', 'gca', 'gcg', 'gau', 'gac', 'gaa', 'gag', 'ggu', 'ggc', 'gga', 'ggg', 'uag' ] if len(x)%3==0: x=[''.join((x[i],x[i+1],x[i+2])) for i in range(0,len(x),3)] try: return sum(codons.index(y)*(64**(len(x)-1-i)) for i,y in enumerate(x)) except ValueError: return 'invalid input' else: return 'invalid input' 

exit:

 >>> codon2dec('cgagag') 1978 >>> codon2dec('ccgcag') 1499 
+1
source

Source: https://habr.com/ru/post/919824/


All Articles