Python re.search

I have a string variable containing

string = "123hello456world789" 
Line

contains no spaces. I want to write a regular expression that prints only words containing (az) I tried a simple regular expression

 pat = "([az]+){1,}" match = re.search(r""+pat,word,re.DEBUG) 

the match object contains only the word Hello , and the word World does not match.

When re.findall() , I could get both Hello and World .

My question is: why can't we do this with re.search() ?

How to do this with re.search() ?

+6
source share
2 answers

re.search() finds a string once per line, documenation :

Scan a string looking for the place where the expression pattern creates a match and returns the corresponding MatchObject instance. Return No if no position in the line matches the pattern; note that this is different from finding zero length at some point in the string.

To fit everyone , you need re.findall() , the documentation :

Returns all matching pattern matches in a string, as a list of strings. The string is scanned from left to right, and the results are returned in the order found. If one or more groups are present in the template, return the list of groups; this will be a list of tuples if the template has more than one group. Empty matches are included in the result if they do not relate to the start of another match.

Example:

 >>> import re >>> regex = re.compile(r'([az]+)', re.I) >>> # using search we only get the first item. >>> regex.search("123hello456world789").groups() ('hello',) >>> # using findall we get every item. >>> regex.findall("123hello456world789") ['hello', 'world'] 

UPDATE:

Due to your duplicate question ( as discussed in this link ). I added another answer here as well:

 >>> import re >>> regex = re.compile(r'([az][az-\']+[az])') >>> regex.findall("HELLO WORLD") # this has uppercase [] # there are no results here, because the string is uppercase >>> regex.findall("HELLO WORLD".lower()) # lets lowercase ['hello', 'worl-d'] # now we have results >>> regex.findall("123hello456world789") ['hello', 'world'] 

As you can see, the reason you refused the first example that you specified is in capital letters, you can simply add the re.IGNORECASE flag, although you mentioned that matches should only be lowercase.

+9
source

The @InbarRose answer shows why re.search works this way, but if you want match objects, not just the string outputs from re.findall , use re.finditer

 >>> for match in re.finditer(pat, string): ... print match.groups() ... ('hello',) ('world',) >>> 

Or if you want list

 >>> list(re.finditer(pat, string)) [<_sre.SRE_Match object at 0x022DB320>, <_sre.SRE_Match object at 0x022DB660>] 

It is usually a bad idea to use string as the name of a variable, given that it is a general module.

+1
source

Source: https://habr.com/ru/post/958916/


All Articles