The third-party regex module (not re ) offers support for partial matching, which is a partial solution. (Lookbehinds, anchor ^ , zero-width match and \b / \b snap all break in a subtle or not so subtle way when you try to drop the beginning of the window and continue the search. With what extreme cases that I have been thinking about so far, I donβt Iβm surprised if there will be more.)
If you go partial=True to regex.match , regex.search , regex.fullmatch or regex.finditer , then in addition to reports of regular, full matches, they also report things that do not match, but can if the line has been expanded:
In [8]: import regex In [9]: regex.search(r'1234', '', partial=True) Out[9]: <regex.Match object; span=(0, 0), match='', partial=True> In [10]: regex.search(r'1234', '12', partial=True) Out[10]: <regex.Match object; span=(0, 2), match='12', partial=True> In [11]: regex.search(r'1234', '12 123', partial=True) Out[11]: <regex.Match object; span=(3, 6), match='123', partial=True> In [12]: regex.search(r'1234', '1234 123', partial=True) Out[12]: <regex.Match object; span=(0, 4), match='1234'>
You can determine if the match was partial or complete with the match partial attribute:
In [13]: regex.search(r'1234', '12 123', partial=True).partial Out[13]: True In [14]: regex.search(r'1234', '1234 123', partial=True).partial Out[14]: False
It will report a match as partial if more data can change the result of the match:
In [21]: regex.search(r'.*', 'asdf', partial=True) Out[21]: <regex.Match object; span=(0, 4), match='asdf', partial=True> In [22]: regex.search(r'ham(?: and eggs)?', 'ham', partial=True) Out[22]: <regex.Match object; span=(0, 3), match='ham', partial=True>
or if more data can lead to a match, will not match:
In [23]: regex.search(r'1(?!234)', '1', partial=True) Out[23]: <regex.Match object; span=(0, 1), match='1', partial=True> In [24]: regex.search(r'1(?!234)', '13', partial=True) Out[24]: <regex.Match object; span=(0, 1), match='1'>
When you reach the end of the data stream, you must disable partial so that regex knows that this is the end, so partial matches do not hide full matches.
With the partial match information, you can drop everything before the start of the partial match and know that none of the dropped data would be in the match ... but lookbehinds might need data, so it would be dirty to do additional work to support lookbehind if you do it. ^ also get confused at the beginning of a line change, \b / \b will not know if the word character was at the end of the discarded data, and it would be difficult to get a zero-right width matching behavior for any definition of "correct" that you choose. I suspect that some other advanced regex features may also interact strangely if you delete data this way; regex has many features.