As @Tomalak said, the regexp mechanism does not have a built-in concept of matching matches, so there is no โsmartโ solution that can be found ( which turned out to be wrong - see below). But this is easy to do with a loop:
import re pat = re.compile("ATG(?:(?!TAA|TGA|TAG)\w\w\w)*") s = 'GATGDTATGDTAAAA' i = 0 while True: m = pat.search(s, i) if m: start, end = m.span() print "match at {}:{} {!r}".format(start, end, m.group()) i = start + 1 else: break
which displays
match at 1:10 'ATGDTATGD' match at 6:15 'ATGDTAAAA'
It works by starting to search again for one character at the beginning of the last match until more matches are found.
Smart or a time bomb?
If you want to live in danger, you can enter a 2-digit code in your original finditer code:
print it.start(1) print it.end(1)
That is, get the start and end of the first ( 1 ) capture group. Without passing an argument, you get the beginning and end of the match as a whole, but, of course, the corresponding statement always corresponds to an empty line (and therefore the beginning and end are equal).
I say this is dangerous because the semantics of the capture group inside the statement (whether lookahead or lookbehind, positive or negative, ...) are fuzzy at best. It's hard to say if you can stumble upon a mistake (or in case of an accident) here! Cute :-)
EDIT: After a night of sleep and a brief discussion of Python-Dev, I find this behavior to be intentional (and equally reliable). To find all matches (possibly matching!) For regexp R, wrap them like this:
pat = re.compile("(?=(" + R + "))")
and then
for m in pat.finditer(some_string): m.group(1)
works great.
Itโs best to read (?=(R)) how to "match an empty line here, but only if R starts here, and if it succeeds, put the information that R corresponds to group 1". Then finditer() is executed as it always happens when matching an empty string: it moves the beginning of the search to the next character and retries (the same as in the first loop in my first answer).
Using this parameter with findall() more difficult, because if R contains capture groups, you will get all of them (you cannot choose and choose how you can do with the matching object, for example, finditer() returns).