Python regex with codons

Fighting RE to search for β€œTAA” sequences (3-character triplets) β€œTAA” again.

I tried the following:

re.findall('TAA...+?TAA',seq) , which of course does not give triplets, but gives sequences

re.findall('TAA([ATGC]{3})+?TAA' , seq) however gives me a list as output

 'AGG', 'TCT', 'GTG', 'TGG', 'TGA', 'TAT', 

Any ideas? Since I, of course, can check the output from

re.findall('TAA...+?TAA',seq)

if the length of% 3 == 0, but how to do it with RE?

+6
source share
1 answer

You want a non-capture group.

(?:...)

Non-convertible version of regular parentheses. Matches any regular expression inside parentheses, but the substring matched by the group cannot be restored after matching or referencing later in the pattern.

Try the following:

 re.findall('TAA(?:[ATGC]{3})+?TAA' , seq) 
+4
source

Source: https://habr.com/ru/post/910242/


All Articles