I am trying to create string variations using optional substitution.
For example, a single substitution pattern deletes any sequence of empty characters. Instead of replacing all occurrences such as
>>> re.sub(r'\s+', '', 'ab c') 'abc'
- I need, instead, the two options that should be created for each case, in that the substitution is performed in one embodiment, but not in the other. For the string 'ab c' I want to have options
['ab c', 'a bc', 'ab c', 'abc']
i.e., the cross product of all binary solutions (the result obviously includes the original string).
In this case, options can be obtained using re.finditer and itertools.product :
def vary(target, pattern, subst): occurrences = [m.span() for m in pattern.finditer(target)] for path in itertools.product((True, False), repeat=len(occurrences)): variant = '' anchor = 0 for (start, end), apply_this in zip(occurrences, path): if apply_this: variant += target[anchor:start] + subst anchor = end variant += target[anchor:] yield variant
In this example, you will get the desired result:
>>> list(vary('ab c', re.compile(r'\s+'), '')) ['abc', 'ab c', 'a bc', 'ab c']
However, this solution only works for fixed-line fixes. Extended functions from re.sub as links to groups cannot be performed as in the following example to insert a space after a sequence of numbers inside a word:
re.sub(r'\B(\d+)\B'), r'\1 ', 'abc123def')
How can I extend or change the approach to accept any valid re.sub argument (without writing a parser to interpret group references)?