Regex vs. string: find () for a simple word boundary

Say I only need to find out if a line read from a file contains a word from a set of end words.

One way to do this is to use regex as follows:

.*\y(good|better|best)\y.*

Another way to achieve this is to use pseudo code:

 if ( (readLine.find("good")   != string::npos) ||
      (readLine.find("better") != string::npos) ||
      (readLine.find("best")   != string::npos) )
 {
   // line contains a word from a finite set of words.
 }

Which way will have the best performance? (i.e. speed and CPU usage)

+3
source share
3 answers

A regular expression will work better, but get rid of these ". *" Parts. They complicate the code and serve no purpose. The regular expression looks like this:

\y(good|better|best)\y

. , regexp, \y, 1 (g | b), 2 (g = > go b = > be), 3 (go = > goo be = > bes | bet), 4 (go = > good bes = > best bet = > bett) .. , , .

+3

, ( "find" ), ( 3 ) , , . , (, , ), , , , , .

+3

, , , :

  • , . (, RE2 RE2, POSIX).
  • Implementation string::find.
  • The length of the string you are looking for.
  • How many lines are you looking for.

My bets are in regular terms, but again: you have to make sure.

+3
source

Source: https://habr.com/ru/post/1785761/


All Articles