Additional optional Regex groups

I am a complete Regex Noob and have spent hours trying to solve this riddle. I think I need to use some kind of optional non-exciting groups or striping.

I want to match the following lines:

  • Neuer Film a von 1000

  • Neuer Film a von 1000 mit b

  • Neuer Film a von 1000 mit b und c

  • Neuer Film a von 1000 mit b und c und d

  • Neuer Film a mit b

  • Neuer Film a mit b und c

  • Neuer Film a mit b und c und d

My regex looks like this:

var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g; 

The problem is that it only matches lines 3 and 4. And it does not match the last two "und", but it packs it into group No.3, and not into group No.4.

Can anyone help with my Regex (which is not very user friendly at all)

+5
source share
1 answer

You really need to use optional groups (e.g. (?:...)? ), But you also need bindings ( ^ to match the beginning of the line and $ to match the end of the line) and lazy ( .*? To match as little as possible characters).

you can use

 /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/ 

See the demo of regex . The demonstration requires /gm modifiers, since the input is a multi-line string.

Template Details :

  • ^ - start of string binding
  • [nN]euer [Ff]ilm - Neuer film / Neuer film / Neuer film
  • \s* - zero or more spaces
  • (.*?) - Group 1: any 0+ characters other than line breaks, as small as possible (that is, until the leftmost occurrence of subsequent subpatterns)
  • (?:\s*[vV]on\s+(\d{4}))? - 1 or 0 occurrences:
    • \s* - spaces 0+
    • [vV]on - von or von
    • \s+ - spaces 1+
    • (\d{4}) - Group 2: 4 digits
  • (?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)? - an optional non-capture group that matches 1 or 0 occurrences:
    • \s+ - spaces 1+
    • [Mm]it - Mit or Mit
    • \s* - spaces 0+
    • (.*?) - Group 3, corresponding to any 0+ characters other than line break characters, as little as possible
    • (?:\s*[uU]nd\s*(.*))? - optional non-capture grouping
      • \s*[uU]nd\s* - und or und , enclosed in 0+ spaces
      • (.*) - Group 4, corresponding to any 0+ characters other than line break characters, as much as possible
  • $ is the end of the line.

 var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d']; var rx = /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/; for (var s of strs) { var m = rx.exec(s); if (m) { console.log('-- ' + s + ' ---'); console.log('Group 1: ' + m[1]); if (m[2]) console.log('Group 2: ' + m[2]); if (m[3]) console.log('Group 3: ' + m[3]); if (m[4]) console.log('Group 4: ' + m[4]); } } 
+7
source

Source: https://habr.com/ru/post/1266604/


All Articles