I have several functions that I use more than a million times for different texts, which means that small improvements in these functions go to big wins in general. Currently, I have noticed that all my functions, which include word counting, are significantly reduced than anything else, so I think I want to try to make word counting differently.
Basically, what my function does is to capture several objects that have text associated with them, make sure that this text does not match certain patterns, and then count the number of words in that text. Basic version of the function:
my $num_words = 0; for (my $i=$begin_pos; $i<=$end_pos; $i++) { my $text = $self->_getTextFromNode($i);
I do a lot of text comparisons similar to what I'm doing here elsewhere in my code, so I assume my problem should be with my word count. Is there a faster way to do this than splitting into \s+ ? If so, what is it and why is it faster (so that I can understand what I'm doing wrong and apply this knowledge to similar problems later).
source share