Limit 1 word to case sensitive and the other to case insensitive in python regex | (Trumpet)

I got the value | (special pipe symbol) in regex, Python. It corresponds to either 1st or 2nd.

ex: a|b Matches either a or b.

My question is: What if I want to combine a with case-sensitive and b with case-insensitive in the above example?

Example:

 s = "Welcome to PuNe, Maharashtra" result1 = re.search("punnee|MaHaRaShTrA",s) result2 = re.search("pune|maharashtra",s) result3 = re.search("PuNe|MaHaRaShTrA",s) result4 = re.search("P|MaHaRaShTrA",s) 

I want to search for Pune as I wrote in the above statement, PuNe . But I have to look for Maharashtra, ignoring the case. How can I search for 1 word case sensitive and another case insensitive? Thus, result1 , result2 , result3 , result4 will give the value not null .

I tried :

 result1 = re.search("pune|MaHaRaShTrA",s1, re.IGNORECASE) 

But it ignores cases for both words.

How can I limit 1 word to case sensitive and another to case insensitive?

+5
source share
2 answers

If you can use the PyPi regex module , you can use a group of modifiers (?i:...) .

 import regex s = "Welcome to PuNe, Maharashtra" print(regex.findall(r"PuNe|(?i:MaHaRaShTrA)",s)) 

Watch an online demo of Python .

Unfortunately, Python re does not support these groups and does not support the alternation of on and off built-in modifiers.

Here are some more details about these designs :

If the modifier appears at the beginning of the template, it changes the matching mode for the entire template, unless it is turned off later. But ( except Python ), the modifier can be displayed in the middle of the template, in which case it affects only part of the next template.

You can mix mode modifiers in group syntax without capturing in all mechanisms that support built-in modifiers, with the exception of Python. For example, (?i:bob) is a non-capturing group with a case-insensitive flag enabled. It matches strings like "bob" and "boB"

+3
source

You can create a lower / upper regular expression for the second word and keep the active shell:

 my_regex = "PuNe|"+"".join("[{}{}]".format(x.upper(),x.lower()) for x in "MaHaRaShTrA") 

which generates: PuNe|[Mm][Aa][Hh][Aa][Rr][Aa][Ss][Hh][Tt][Rr][Aa]

and re.search(my_regex,s1) does what you want without any option.

+2
source

Source: https://habr.com/ru/post/1269471/


All Articles