Generator Function Performance

I am trying to understand the performance of a generator function. I used cProfile and the pstats module to collect and validate profiling data. The corresponding function is as follows:

def __iter__(self): delimiter = None inData = self.inData lenData = len(inData) cursor = 0 while cursor < lenData: if delimiter: mo = self.stringEnd[delimiter].search(inData[cursor:]) else: mo = self.patt.match(inData[cursor:]) if mo: mo_lastgroup = mo.lastgroup mstart = cursor mend = mo.end() cursor += mend delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend)) else: raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200]) 

self.inData is a unicode text string, self.stringEnd is a dict with 4 simple regular expressions, self.patt is one large regular expression. The thing is to split a large string into smaller lines, one by one.

Profiling a program that uses it, I found that most of the program execution time is spent on this function:

 In [800]: st.print_stats("Scanner.py:124") 463263 function calls (448688 primitive calls) in 13.091 CPU seconds Ordered by: cumulative time List reduced from 231 to 1 due to restriction <'Scanner.py:124'> ncalls tottime percall cumtime percall filename:lineno(function) 10835 11.465 0.001 11.534 0.001 Scanner.py:124(__iter__) 

However, looking at the profile of the function itself, there is not much time in the subheadings of the functions:

 In [799]: st.print_callees("Scanner.py:124") Ordered by: cumulative time List reduced from 231 to 1 due to restriction <'Scanner.py:124'> Function called... ncalls tottime cumtime Scanner.py:124(__iter__) -> 10834 0.006 0.006 {built-in method end} 10834 0.009 0.009 {built-in method group} 8028 0.030 0.030 {built-in method match} 2806 0.025 0.025 {built-in method search} 1 0.000 0.000 {len} 

The rest of the function is not much, besides the purpose and if-else. Even the send method on the generator that I use is fast:

  ncalls tottime percall cumtime percall filename:lineno(function) 13643/10835 0.007 0.000 11.552 0.001 {method 'send' of 'generator' objects} 

Is it possible that yield , passing the value back to the consumer, takes up most of the time ?! Anything else I don't know about?

EDIT

Perhaps I should have mentioned that the __iter__ generator __iter__ is a method of a small class, so self refers to an instance of this class.

+6
source share
2 answers

This is actually a Dunes answer, which, unfortunately, only gave it as a comment and does not seem to be inclined to put it in the correct answer.

The main culprit in efficiency was line cuts. Some temporary measurements have shown that slice performance decreases markedly with large slices (which means a large chunk from an already large string). To get around this, I now use the pos parameter for the methods of the regex object:

  if delimiter: mo = self.stringEnd[delimiter].search(inData, pos=cursor) else: mo = self.patt.match(inData, pos=cursor) 

Thanks to everyone who helped.

+2
source

If you read your sample correctly, you take a generator object by placing it in delimiter and using it to search for an array. This may not be your speed, but I'm sure the error.

+1
source

Source: https://habr.com/ru/post/890081/


All Articles