Why is only the first letter returned by the match function?

If a metacharacter ? matches the previous item zero or one time then

why

 "ab".match(/a?/) 

returns ["a"] ,

but

 "ab".match(/b?/) 

returns [""]

?

+5
source share
2 answers

I wonder if I can better explain this with the following analogy.

Imagine a friendly fast food counter, a line of hungry customers and a kind of conveyor belt with hamburgers. She offers hamburgers one by one to each client until they are all happy (if some hamburgers remain, it does not matter). She then writes a list of what each client received and gives this list to her manager. The cashier will be a regular expression engine, hamburgers are characters from the input string, and customers are regular expression subexpressions. Manager is a matching function.

For example, when comparing abbc with / /[a]b+./ scene looks like this:

 ("[a]", "b +" and "." stand the queue)

 Cashier: Hi, "[a]", would you like "a"?
 [a]: Sure, thanks!
 C: would you also like "b"?
 [a]: No, thanks, I'm fine (goes).
 C: Hi, "b +", would you like "b"?
 b +: Sure.
 C: Would you like another "b"?
 b +: Yes, I'm hungry.
 C: Can I offer you "c" also?
 b +: Not my kind of thing (goes).
 C: Hi, ".", I have only one "c" left.
 .: I don't care what it is, just gimme it (goes).
 C: All served!  Looks like I'll get the job!

If it happens that the cashier is unable to satisfy the client, she has the right to call the previous one and ask them to return what they have. This is called backtracking. Consider:

  "abx" against /.+[xyz.BIZ/

  C: Hi, ". +", Would you like "a"?
  . +: Yum-yum!
  C: How about "b"?
  . +: Yum-yum!
  C: And "x"?
  . +: Yum-yum!
  C: My belt is empty!  (. + goes)
  C: Hi, "[xyz]", I'm afraid I'm sold out. 
  [xyz]: That out of the question.  Can I see the manager?
  C: Wait, I think we can sort it out!  (calls ". +")
  C (to ". +"): Sorry pal, you know, this nasty guy over there ... 
               I wonder if you could you give me back your last one?
  . +: No prob ... (gives "x" back)
  C (to "[xyz]"): I've got something for you.  Do you eat "x"?
  [xyz]: If you want to get anything done in this country 
         you've got to complain until you are blue in the mouth 
         (gets his "x" and goes in a rage)
  C: Gosh, what a day ...

Now back to your examples:

 Scene I. "ab" against / a? / 
 Burgers: a and b, customer: a?

 C: Hi, "a?"  would you like "a"?
 a ?: Sure, thanks.
 C: Can I offer you "b" also?
 a ?: No, thanks, I'm fine (goes).
 Manager: I need the inventory report, now!
 C: Here you go: "a?"  got "a", we have "b" and "c" left.

 Scene II.  "ab" against / b? / 
 Burgers: a and b, customer: b?

 C: Hi, "b?"  would you like "a"?
 b ?: No thanks, but that no problem.  (goes).
 M: Status?
 C: "b?"  got nothing and went.  a, b, c are still there.

So basically b? - A very good (and not particularly hungry) guy, and he is happy even if the cashier has nothing for him. If he is the only one in line, this is her lucky day!

0
source

Because this is the first coincidence. The regular expression first tries to match at position 0, where regular expression # 1 matches the value of a , and regular expression # 2 matches the empty string. He then tries to match at position 1, where regex # 1 matches the empty string and regex # 2 matches the letter b . Finally, he tries to match at position 3, where both regular expressions match an empty string.

Compare the returned matches with the global flag:

 > "ab".match(/a?/) ["a"] > "ab".match(/a?/g) ["a", "", ""] > "ab".match(/b?/) [""] > "ab".match(/b?/g) ["", "b", ""] 

Why in the first case does not return [""]?

Due to backtracking mechanisms. When trying to match in any position, the engine will try to greedily 1 check all letters of the regular expression for the letters of the string. When it reaches the end of the regular expression using this method, the match is performed. When the letter does not fit, it tries to return to the regular expression to find out if any omissions can be made - when using modifiers such as * or ? , or alternatives ( | ) must be considered and then continued from there.

Example: match /b?/ At position 0 of "ab" :

  // - "": ✓
 / b / - "a": ×
 / b?  / - "": ✓ - succeed (end of regex)
   ^ means here that the "b" token is omitted

Example: match /a?/ At position 0 of "ab" :

  // - "": ✓
 / a / - "a": ✓ - succeed (end of regex)

Example: match /ab?(bc)?/ At position 0 of "abc"

  // - "": ✓
 / a / - "a": ✓
 / ab / - "ab": ✓
 / ab (b) / - "abc": ×
 / ab (bc)?  / - "ab": ✓ - succeed (end of regex)

1: Usually, at least. Many regex flavors also provide quantifiers that are lazy or possessive if you want to control exact matching. For example, /ab??(bc)?/ abc in "abc"

+8
source

Source: https://habr.com/ru/post/1202069/


All Articles