Python RegEx using re.sub with multiple templates

Question

Python RegEx using re.sub with multiple templates

I am trying to use Python RegEx re.sub to remove the colon before the antepenultimate vowel [aeiou] word if the previous vowel (from the end) is preceded by another vowel.

So, the colon should be between the 3rd and 4th vowel counts from the end of the word.

So, the first example will be broken as w4:32ny1h .

 we:aanyoh > weaanyoh # w4:32ny1h hiru:atghigu > hiruatghigu yo:ubeki > youbeki

Below is the RegEx statement I'm trying to use, but I can't get it to work.

 word = re.sub(ur"([aeiou]):([aeiou])(([^aeiou])*([aeiou])*([aeiou])([^aeiou])*([aeiou]))$", ur'\1\2\3\4', word)

+5

python regex

user2743 Nov 15 '15 at 19:39

source share

6 answers

Jeff y · Answer 1 · 2015-11-15T20:17:00+0000

Do you have too many brackets (and other additional materials)?

 word = re.sub(ur"([aeiou]):(([aeiou][^aeiou]*){3})$", ur'\1\2', word)

Tom zych · Answer 2 · 2015-11-15T20:23:50+0000

Not sure if you want to completely ignore consonants; this regular expression will be. Otherwise similar to Jeff.

 import re tests = [ 'we:aanyoh', 'hiru:atghigu', 'yo:ubeki', 'yo:ubekiki', 'yo:ubek' ] for word in tests: s = re.sub(r'([^aeiou]*[aeiou][^aeiou]*):((?:[^aeiou]*[aeiou]){3}[^aeiou]*)$', r'\1\2', word) print '{} > {}'.format(word, s)

dawg · Answer 3 · 2015-11-15T21:19:05+0000

You state that you are targeting a word on a string, so first set the bindings only for word processing:

 \b[regex will go here]\b ^ ^ assert a word boundary

Further, the colon continued and was followed by [aeiou] with two more [aeiou] in the part following the colon. I assume that independent of the case?

 (?i)(\b\w+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b) ^ match a character that is NOT vowel, space or not a ^ \W=[^a-zA-Z0-9_]

Demo

(Note the use of [^aeiou\W] , which is a consonant of letters, numbers and _, but not other symbols of the Demo .)

Python demo:

 import re tests={ 'matches':[ 'we:aanyoh', 'hiru:atghigu', 'yo:ubeki' ], 'no match':[ 'wz:ubeki', 'we:a anyoh', 'yo:ubek', 'hiru:atghiguu' ] } for k, v in tests.items(): print k for e in v: s=re.sub(r'(?i)(\b\w+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b)', r'\1\2', e) print '\t{} > {}'.format(e, s)

Print

 matches we:aanyoh > weaanyoh hiru:atghigu > hiruatghigu yo:ubeki > youbeki no match wz:ubeki > wz:ubeki we:a anyoh > we:a anyoh yo:ubek > yo:ubek hiru:atghiguu > hire:atghiguu

This will only process words with one colon. If you want to combine words with multiple colons, but have the same pattern, change the LH pattern to a character class that contains a colon and an anchor that is not \b .

Example: (?i)(^[\w:]+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b)

Alexander · Answer 4 · 2015-11-15T20:09:39+0000

It should work with this:

 word = re.sub(ur"(?<=[aeiou]):(?=[aeiou]([^aeiou]*[aeiou]){2}[^aeiou]*$)", ur'', word)

see an example here: https://regex101.com/r/kA8xH3/2

Note that I only commit the colon and replace it with an empty string, and not capture groups and concatenate them.

Tt checks the colon combination, then scans to check that there are 2 additional vowels (and possibly consonants). It also allows you to add additional consonants at the end, but ensures that there are no more vowels through $

Ionut · Answer 5 · 2015-11-15T20:22:41+0000

This will be done:

  word = re.sub(ur"([aeiou]):([aeiou])([^\Waeiou]*[aeiou][^\Waeiou]*[aeiou][^\Waeiou]*)$", ur'\1\2\3', word)

http://www.phpliveregex.com/p/dCa

7stud · Answer 6 · 2015-11-15T21:37:49+0000

Roundup (I used a vowel to indicate where in the word there should be a replacement). Let me know if you want me to add other test strings.

 import re strings = [ 'wE:aanyoh', 'hirU:atghigu', 'yO:ubeki', 'xE:aaa', 'xx:aaa', 'xa:aaaxA:aaa', 'xa:aaaxA:aaaxx', 'xa:aaaxA:aaxax', 'a:aaaxA:aaxax', 'e:aeixA:aexix', ] pattern = r""" ( .* [aeiou] ) : ( [aeiou] .*? [aeiou] .*? [aeiou] ) """ template = "{:>15}: {}" for string in strings: print( template.format('original', string) ) print(template.format('Alexander:', re.sub(ur"(?<=[aeiou]):(?=[aeiou]([^aeiou]*[aeiou]){2}[^aeiou]*$)", ur'', string, flags=re.I) )) print(template.format('lonut:', re.sub(ur"([aeiou]):([aeiou])([^\Waeiou]*[aeiou][^\Waeiou]*[aeiou][^\Waeiou]*)$", ur'\1\2\3', string, flags=re.I) )) print(template.format('Tom Zych:', re.sub(r'([^aeiou]*[aeiou][^aeiou]*):((?:[^aeiou]*[aeiou]){3}[^aeiou]*)$', r'\1\2', string, flags=re.I) )) print(template.format('Jeff Y:', re.sub(ur"([aeiou]):(([aeiou][^aeiou]*){3})$", ur'\1\2', string, flags=re.I) )) print(template.format('7stud:', re.sub(pattern, r'\1\2', string, count=1, flags=re.X|re.I) )) print("\n")

  original: wE:aanyoh Alexander:: wEaanyoh lonut:: wEaanyoh Tom Zych:: wEaanyoh Jeff Y:: wEaanyoh 7stud:: wEaanyoh original: hirU:atghigu Alexander:: hirUatghigu lonut:: hirUatghigu Tom Zych:: hirUatghigu Jeff Y:: hirUatghigu 7stud:: hirUatghigu original: yO:ubeki Alexander:: yOubeki lonut:: yOubeki Tom Zych:: yOubeki Jeff Y:: yOubeki 7stud:: yOubeki original: xE:aaa Alexander:: xEaaa lonut:: xEaaa Tom Zych:: xEaaa Jeff Y:: xEaaa 7stud:: xEaaa original: xx:aaa Alexander:: xx:aaa lonut:: xx:aaa Tom Zych:: xx:aaa Jeff Y:: xx:aaa 7stud:: xx:aaa original: xa:aaaxA:aaa Alexander:: xa:aaaxAaaa lonut:: xa:aaaxAaaa Tom Zych:: xa:aaaxAaaa Jeff Y:: xa:aaaxAaaa 7stud:: xa:aaaxAaaa original: xa:aaaxA:aaaxx Alexander:: xa:aaaxAaaaxx lonut:: xa:aaaxAaaaxx Tom Zych:: xa:aaaxAaaaxx Jeff Y:: xa:aaaxAaaaxx 7stud:: xa:aaaxAaaaxx original: xa:aaaxA:aaxax Alexander:: xa:aaaxAaaxax lonut:: xa:aaaxAaaxax Tom Zych:: xa:aaaxAaaxax Jeff Y:: xa:aaaxAaaxax 7stud:: xa:aaaxAaaxax original: a:aaaxA:aaxax Alexander:: a:aaaxAaaxax lonut:: a:aaaxAaaxax Tom Zych:: a:aaaxAaaxax Jeff Y:: a:aaaxAaaxax 7stud:: a:aaaxAaaxax original: e:aeixA:aexix Alexander:: e:aeixAaexix lonut:: e:aeixAaexix Tom Zych:: e:aeixAaexix Jeff Y:: e:aeixAaexix 7stud:: e:aeixAaexix

Python RegEx using re.sub with multiple templates

More articles: