Update
If your lines are valid chemical formulas, why use indexes / numbers / letters? There are characters without spaces. Since there must be a required letter or ( , use them in the [az(] character class, and then add \S* (zero or more non-spaces):
/(?:\d+ )?[az(]\S*/gi
See the demo of regex . Construction (?:...)? represents an optional group that is not associated with capture (i.e., a group that is used only for grouping, but not for capture (= saving the subpattern inside the memory buffer).
Original answer explaining the root cause
You have numbers and a space at the beginning as optional subpatterns, instead you need to match them, but put them in an optional group:
(?:[0-9]+ )?\(*([az]+[β-β]*)+\)*[β-β]*
Watch the regex demo
Your [0-9]* ?? turned into (?:[0-9]+ )? . Please note that here you do not need to use the lazy version of the quantifier ? , she will work just as greedy. I also deleted 2 unnecessary outer groupings (...) .
Since the group (?:[0-9]+ )? optional, space will only be matched if there is a digit in front of it. If there are no numbers, the next character that can be matched is zero or more ( . Then the letter [az] must be present (if not ( , the letter will be the first character in the match).
Let me break it:
(?:[0-9]+ )? - optional one or more digits followed by a space\(* - zero or more ( (perhaps you mean ? )([az]+[β-β]*)+ - zero or more sequences of one or more letters, followed by zero or more sbscript bits\)* - zero or more ) (perhaps you mean ? )[β-β]* - zero or more digits of the index
If you also want to make sure that you do not match (Ca or H) , you must also separate \(*...\)* as follows:
(?:[0-9]+ )?(?:(?:[az]+[β-β]*)+|\((?:[az]+[β-β]*)+\))[β-β]*
Watch another demo
source share