Can a regular expression intersect between two regular expressions?

Given a few regular expressions, can we write regular expressions equal to their intersection?

For example, given the two regular expressions c[az][az] and [az][aeiou]t , their intersection contains cat and cut and possibly more. How can we write a regular expression to intersect them?

Thanks.

+6
source share
5 answers

The sample examples are easy to use, but technically they are no longer ordinary languages. However, you can take the intersection of two regular languages, and this complement is regular.

First of all, note that regular expressions can be converted to and from NFA; they are both ways of expressing regular languages.

Secondly, by DeMorgan law,

Morgan law used at the intersection of two regular languages

So these are the steps to calculate the intersection of two registers:

  • Convert both RegExs to NFA.
  • Calculate the complement of both NFAs.
  • Calculate the union of two additions.
  • Calculate the complement of this union.
  • Convert the resulting NFA to RegEx.

Some sources:

+6
source

Logical AND in regular expression is represented

 (?=...)(?=...) 

So,

 (?=[az][aeiou]t)(?=c[az][az]) 

Regular expression visualization

+5
source

Mathematically, the intersection of two regular languages ​​is regular, so there must be a regular expression that accepts it.

Building it through the appropriate NFAs is probably the easiest. Consider two NFAs that correspond to two regular expressions. The new states of Q are pairs (Q1, Q2) of two NFAs. If the first NFA has a transition (P1, x, Q1) and (P2, x, Q2) in the second NFA, then and only then does there exist a transition ((P1, P2), x, (Q1, Q2)) in the new NFA. A new state (Q1, Q2) is initial / final if both Q1 and Q2 are initial / final.

If you use NFA with & epsilon; -moves, then also for each transition (P1, epsilon ;, Q1) there will be a transition ((P1, P2), epsilon; (Q1, P2)) for all P2 states. Similarly for - moves to the second NFA.

Now convert the new NFA into a regular expression with any known algorithm and what it is.

As for PCRE, they are not, strictly speaking, regular expressions. In general, there is no way to do this. Sometimes you can use lookaheads, for example ^(?=regex1$)(?=regex2$) , but this is only useful for matching the entire string and is not suitable for searching or embedding in other regular expressions. Without binding, two views can end with the corresponding lines of different lengths. This is not an intersection.

+5
source

First, let's agree on the terms. My syntactic assumption will be that

The intersection of several regular expressions is one regular expression that matches the lines that each of the component expressions also matches.

General parameter

To check the intersection of two patterns, a common method (pseudo-code):

 if match(regex1) && match(regex2) { champagne for everyone! } 

Regex Option

In some cases, you can do the same with lookaheads, but little can be done for complex regular expressions, besides making your regular expression more obscure for your enemies. Why is there little use? Because in any case, the engine will parse the entire string several times.

Logical and

A general pattern for checking AND that a string exactly matches regex1 and regex2 would be:

 ^(?=regex1$)(?=regex2$) 

$ in each representation ensures that each line matches the pattern and nothing more.

Matching when AND

Of course, if you do not want to just check the logical value of AND, as well as perform the actual comparison, then after viewing you can add a star point to consume the string:

 ^(?=regex1$)(?=regex2$).* 

Or ... After checking the first condition, simply match the second:

 ^(?=regex1$)regex2$ 

This is a method used, for example, when checking a password. For more on this, see Mastering Lookahead and Lookbehind .

Bonus Section: Regular Expression Union

Instead of working at the intersection, let's say you are interested in combining the following regular expressions, i.e. regular expression that matches any of these regular expressions:

  • to catch
  • cat1
  • cat2
  • cat3
  • cat5

This is done using the interleave operator | :

 catch|cat1|cat2|cat3|cat5 

In addition, such a regular expression can often be compressed, as in:

 cat(?:ch|[1-35]) 
+3
source

For operation And we have something similar in RegEx

(REGEX) (REGEX)

Taking your example

 'Cat'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) ["Cat", "C", "a", "t"] 'Ca'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) //null 'Cat123'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) //null 

where

 ([A-Za-z]+) //Match All characters 

and

 ([aeiouAEIOU]+) //Match all vowels 

Combine them as will be

([A-Za-Z] +) ([aeiouAEIOU] +) ([A-Za-Z] +)

, eg:

 'Hmmmmmm'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) //null 'Stckvrflw'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) null 'StackOverflow'.match(/^([A-Za-z]+)([aeiouAEIOU]+)([A-Za-z]+)$/) ["StackOverflow", "StackOverfl", "o", "w"] 
0
source

Source: https://habr.com/ru/post/970519/


All Articles