Regex issue with specified captured pairs

I have the following meaning:

start=2011-03-10T13:00:00Z;end=2011-03-30T13:00:00Z;scheme=W3C-DTF 

I use the following regular expression to cross out start and end dates and assign them to my own named capture pair:

 #^start=(?P<publishDate>.+);end=(?P<expirationDate>.+);#ix' 

This is probably not the absolute best REGEX, but it works quite well if there are "start" and "end" values.

Now I need to still match publishDate if expirationDate is missing and vice versa.

How to do this using a single expression? I'm not the best of regular expressions, and I'm starting to wander around more advanced things, so any help with this would be greatly appreciated.

Thanks!

UPDATE:

Thanks to Mr. Chung, I solved this problem with the following expression:

  #^(start=(?P<publishDate>.*?);)?(end=(?P<expirationDate>.*?);)?#xi 

As always, thank you very much for your help, everyone. :)

+4
source share
3 answers

Use (...)? for optional section

 ^(start=(?P<publishDate>.+);)?(end=(?P<expirationDate>.+));)? 
+4
source

Both set the named buffer to a value (instead of zero or undefined) I recommend the first one.

1. To find both or both in any order:
/^(?=.*\bstart=(?P<publishDate>.*?);|(?P<publishDate>))(?=.*\bend=(?P<expirationDate>.*?);|(?P<expirationDate>))/ix

 /^(?= # from beginning, look ahead for start .*\b # any character 0 or more times (backtrack to match 'start') start=(?P<publishDate>.*?); # put start date in publish | (?P<publishDate>) # OR, put empty string publish ) (?= # from beginning, look ahead for end .*\b # same criteria as above ... end=(?P<expirationDate>.*?); | (?P<expirationDate>) ) /ix 

2. To find both or both at the beginning / end of an order:
/^(?:.*\bstart=(?P<publishDate>.*?);|(?P<publishDate>))(?:.*\bend=(?P<expirationDate>.*?);|(?P<expirationDate>))/ix

Edit -

@ Josh Davis - I needed to look for PCRE.org, there was great stuff.

There are no duplicate name problems with Perl.
Documents: "If several groups have the same name, then this means the leftmost defined group in the current match." There is never a problem when used in alternation.

With PCRE ..
Duplicate names will work correctly with PHP if they are used with the reset branch.
The reset branch ensures that duplicate names occupy the same capture group.
After that, using dup constant names, $ match ['name'] will either contain the value
or an empty string, but it will exist.

t

(? J) = PCRE_DUPNAMES
(? | ... | ...) = Branch reset

It works:
/(?Ji)^
(?= (?| .* end = (?P<expirationDate> .*? ); | (?P<expirationDate>)) )
(?= (?| .* start = (?P<publishDate> .*? ); | (?P<publishDate>)) )
/x

Try it here: http://www.ideone.com/zYd24

 <?php $string = "start=2011-03-(start)10T13:00:00Z;end=2011-03-(end)30T13:00:00Z;scheme=W3C-DTF"; preg_match('/(?Ji)^ (?= (?| .* end = (?P<expirationDate> .*? ); | (?P<expirationDate>)) ) (?= (?| .* start = (?P<publishDate> .*? ); | (?P<publishDate>)) ) /x', $string, $matches); echo "Published = ",$matches['publishDate'],"\n"; echo "Expires = ",$matches['expirationDate'],"\n"; print_r($matches); ?> 

Output

 Published = 2011-03-(start)10T13:00:00Z Expires = 2011-03-(end)30T13:00:00Z Array ( [0] => [expirationDate] => 2011-03-(end)30T13:00:00Z [1] => 2011-03-(end)30T13:00:00Z [publishDate] => 2011-03-(start)10T13:00:00Z [2] => 2011-03-(start)10T13:00:00Z ) 
+2
source

If 'start =;' missing when the corresponding date is missing, the Stephen Chung code is OK

Otherwise, I think replacing "+" with "*" is enough:

 #^start=(?P<publishDate>.*?);end=(?P<expirationDate>.*?);#ix' 

By the way, '?' it is necessary that the point is uneven in each code

0
source

Source: https://habr.com/ru/post/1343532/


All Articles