What is the difference between findall () and using a for loop with an iterator to find pattern matches

I use this to calculate the number of sentences in the text:

fileObj = codecs.open( "someText.txt", "r", "utf-8" ) shortText = fileObj.read() pat = '[.]' for match in re.finditer(pat, shortText, re.UNICODE): nSentences = nSentences+1 

Someone told me this is better:

 result = re.findall(pat, shortText) nSentences = len(result) 

Is there any difference? Don't they do the same?

+4
source share
5 answers

The second is likely to be a little faster, since the iteration is done entirely in C. How much faster? About 15% in my tests (matching 'a' to 'a' * 16 ), although this percentage will decrease as the regular expression becomes more complex and takes up most of the execution time. But he will use more memory, since he is actually going to create a list for you. Assuming you don't have tons of matches, not too much memory.

As for, I would prefer, I kind of like a second laconicism, especially when it is combined into one statement:

 nSentences = len(re.findall(pat, shortText)) 
+4
source

The finditer function returns an iterator of matching objects.

The findall function returns a list for matching strings. .

The advantage of iterators over lists is that they are memory-friendly (producing values ​​only when necessary).

The advantage of matching objects over strings is that they are universal (they give you groups, groupdict, start, end, span, etc.).

The choice that best depends on your needs. If you need a list of matching strings, then findall is fine. If you need object matching methods or if you need to save memory, then finditer is the way to go.

Hope this helps. Good luck with your project :-)

+3
source

They do the same thing. Your choice should be dictated by whether your other use of the proposed iterator or list is better.

+1
source

One difference between finditer and findall is that the former returns regex match objects , while the other returns a list of groups if one or more groups are present in the template; this will be a list of tuples if the template has more than one group.

In addition, it all depends on your use.

+1
source

There are two main differences:

1) findall() returns a list, and finditer() returns an iterator. This can make a huge difference if you intend to process large strings (like files).

2) findall() returns str objects, and finditer() returns Match objects. I think the main difference. Thus, depending on what information you need from the matches, you can choose one or the other. Here is a small example:

We want to get all numbers from a string:

 >>> s = 'I have 921 apples, 53 oranges, 3 bananas and 1 lemon.' # if you just need to find them, better use findall(): >>> re.findall('\d+', s) ['921', '53', '3', '1'] # but, if you need more than just that, use finditer(): >>> [(m.group(), m.start(), m.end()) for m in re.finditer('\d+', s)] [('921', 7, 10), ('53', 19, 21), ('3', 31, 32), ('1', 45, 46)] 
+1
source

Source: https://habr.com/ru/post/1394258/


All Articles