Uppercase Regex Decomposition

I would like to replace strings like 'HDMWhoSomeThing' with 'HDM Who Some Thing' with regex.

Therefore, I would like to extract words that start with an uppercase letter or consist only of capital letters. Note that in the string 'HDMWho' last letter of the uppercase is that the first letter of the word Who - and should not be included in the word HDM .

What is the correct regular expression to achieve this? I have tried many regular expressions similar to [AZ][az]+ , but to no avail. [AZ][az]+ gives me 'Who Some Thing' - without 'HDM' , of course.

Any ideas? Thanks, Rukki

+4
source share
5 answers
 #! /usr/bin/env python import re from collections import deque pattern = r'([AZ]{2,}(?=[AZ]|$)|[AZ](?=[az]|$))' chunks = deque(re.split(pattern, 'HDMWhoSomeMONKEYThingXYZ')) result = [] while len(chunks): buf = chunks.popleft() if len(buf) == 0: continue if re.match(r'^[AZ]$', buf) and len(chunks): buf += chunks.popleft() result.append(buf) print ' '.join(result) 

Conclusion:

  HDM Who Some MONKEY Thing XYZ 

Judging by the lines of code, this task is much more natural for re.findall :

 pattern = r'([AZ]{2,}(?=[AZ]|$)|[AZ][az]*)' print ' '.join(re.findall(pattern, 'HDMWhoSomeMONKEYThingX')) 

Conclusion:

  HDM Who Some MONKEY Thing X 
+2
source

Try to split this regex:

 /(?=[AZ][az])/ 

And if your regex engine does not support splitting empty matches, try this regex to put spaces between words:

 /([AZ])(?![AZ])/ 

Replace it with " $1" (space plus first group match). Then you can smash in space.

+2
source

one insert:

'' .join (a or b for a, b in re.findall ('([AZ] [az] +) | (?: ([AZ] *) (? = [AZ]))', c))

using regexp

([AZ] [AZ] +) | (:? ([AZ] *) (= [AZ]))

+2
source

Maybe "[AZ] *? [AZ] [az] + '?

Edit: This works: [AZ] {2,} (?! [Az]) | [AZ] [az] +

 import re def find_stuff(str): p = re.compile(r'[AZ]{2,}(?![az])|[AZ][az]+') m = p.findall(str) result = '' for x in m: result += x + ' ' print result find_stuff('HDMWhoSomeThing') find_stuff('SomeHDMWhoThing') 

Prints out:

HDM Who Some Thing

Some HDM Who Thing

+1
source

So, the words "in this case:

  • Any number of uppercase letters - unless the last uppercase letter is followed by a lowercase letter.
  • One uppercase letter followed by any number of lowercase letters.

try:

([AZ]+(?![az])|[AZ][az]*)

The first rotation includes a negative lookahead (?! [Az]), which handles the boundary between the word all-caps and the initial word caps.

+1
source

Source: https://habr.com/ru/post/1301441/


All Articles