Well, my task seems easy, but I don't have enough options. Therefore any help would be appreciated.
I have a lot of fasta DNA sequences, and they need to be cut in certain positions and then the resulting parts are concatenated. Therefore, if my sequence file looks like this:
~$ cat seq_file
>Sequence1
This is now a sequence that must require a bit of slicing and concatenation to be useful
>Sequence2
I have many more uncleaned strings like this in the form of sequences
I want the result to be like this:
>Sequence1
This is useful
>Sequence2
I have cleaned sequences
Now the slicer parts are determined by the slice indices from a separate csv file. In this case, the slice positions are organized as follows:
~$ cat test.csv
Sequence1,0,9,66,74,,
Sequence2,0,5,15,22,48,57
My code is:
from Bio import SeqIO
import csv
seq_dict = {}
for seq_record in SeqIO.parse('seq_file', 'fasta'):
descr = seq_record.description
seq_dict[descr] = seq_record.seq
with open('test.csv', 'rb') as file:
reader = csv.reader(file)
for row in reader:
seq_id = row[0]
for n in range(1,7):
if n % 2 != 0:
start = row[n]
else:
end = row[n]
for key, value in sorted(seq_dict.iteritems()):
if key == string_id:
try:
slice_seq = value[int(start):int(end)]
print key
print slice_seq
except ValueError:
print 'Ignore empty slice indices.. '
Now this will print:
Sequence1
Thisisnow
Sequence1
useful
Ignore empty slice indices..
Sequence2
Ihave
Sequence2
cleaned
Sequence2
sequences
So far so good, this is what I expected. But how can I slice fragments together by concatenating or attaching, or any possible operation inside python to my desired end? Thanks.