I am using python 2.7. I am working with a fasta file containing the DNA sequence of a modern human Y chromosome. This is actually a long string of 20,000,000 characters, such as ATCGACGATCACACG .... I want to convert this very long string to a list of triad strings, for example, this string:
My_sequence_string= "ATGTACGTCATAG"
to this list:
My_sequence_list= ["ATG","TAC","GTC","ATA"]
This is my code:
str_Reading_Frame1=open("Ychromosome.fa", "r").read()
list_Reading_Frame1=[]
def str_to_list(list, str):
if len(str)>2:
list.append(str[:3])
str_to_list(list, str[3:])
str_to_list(list_Reading_Frame1, str_Reading_Frame1)
But I see a memory limit error. I think the problem is calling the function inside it, but I don't know how to refine my code. I don't want to import modules like Biopython, I want to do it myself (with your help :-))
user4374379
source
share