Regular expression composition

Requirement: two expressions, exp1and exp2, we need to match one or more of them. So I came up with

(exp1 | exp2)*

However, in some places I see what is used below,

(exp1 * (exp2 exp1*)*)

What is the difference between the two? When will you use one over the other?

We hope that fiddle will make this more clear,

var regex1 = /^"([\x00-!#-[\]-\x7f]|\\")*"$/;
var regex2 = /^"([\x00-!#-[\]-\x7f]*(\\"[\x00-!#-[\]-\x7f]*)*)"$/;

var str = '"foo \\"bar\\" baz"';
var r1 = regex1.exec(str);
var r2 = regex2.exec(str);

EDIT: It appears that there is a difference in behavior between the two apporaches when we assemble the groups. The second approach captures the entire string, while the first approach captures only the last matching group. See Updated fiddle .

+4
source share
2 answers

- .

(exp1 | exp2)* , . , .

(exp1 * (exp2 exp1*)*) . unroll-the-loop:

(expr1|expr2|...)*. , . - (a*)*.

, , . , , . :

normal* ( special normal* )*

, exp1 , , exp2 , . , , normal* .

"([^"\\]|\\.)*" regex "some text here": 35 :

enter image description here

"[^"\\]*(\\.[^"\\]*)*" 6 , .

enter image description here

, regex101.com , , , , , - .

JS benchmark.js:

var suite = new Benchmark.Suite();
Benchmark = window.Benchmark;
suite
  .add('Regular RegExp test', function() {
      '"some text here"'.match(/"([^"\\]|\\.)*"/);
    })
  .add('Unrolled RegExp test', function() {
      '"some text here"'.match(/"[^"\\]*(\\.[^"\\]*)*"/);
    })
  .on('cycle', function(event) {
    console.log(String(event.target));
  })
  .on('complete', function() {
    console.log('Fastest is ' + this.filter('fastest').map('name'));
  })
  .run({ 'async': true });
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.13.1/lodash.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/platform/1.3.1/platform.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/benchmark/2.1.0/benchmark.js"></script>
Hide result

:

Regular RegExp test x 9,295,393 ops/sec ±0.69% (64 runs sampled)
Unrolled RegExp test x 12,176,227 ops/sec ±1.17% (64 runs sampled)
Fastest is Unrolled RegExp test

, , - PHP ( ~ 0.45 ~ 0.22).

. , .

+4

?

, . , , , (), . (exp1 | exp2)* (exp1 * (exp2 exp1*)*) . , , ().

?

Edit

(exp1 * (exp2 exp1*)*) - . . @Wiktor Stribiżew.


Proof

- , DFA. , DFA .

(: a = exp1 b = exp2)

(a*(ba*)*)

enter image description here

(a|b)*

enter image description here

, DFA ? , . , DFA:

enter image description here

+2

Source: https://habr.com/ru/post/1652409/


All Articles