Why did this regex return?

Why does this regex return true?

Regex.IsMatch("العسكرية", "العسكري") 

I googled and nothing came.

+4
source share
3 answers

I suspect that you posted the actual reversal, where the shorter text is actually a template and the longer entry is comparable. In this case, this will return true, since the pattern matches all but the last letter in the word.

To clarify, العسكرية is the template, and العسكرية is the input. Since I know Arabic, I can tell you that the latter will indeed be a partial coincidence of the former, so the result would be true if the values ​​were actually canceled. If you refer to this table of Arabic alphabets , you can see that the letter yā (at the bottom of the table) is the same letter. Its appearance depends on where it occurs in a word. In the first word, it appears at the end, and in the last - the second letter.

When I copy / paste from a message, the values ​​change, which leads to a true value. To work better with this, we can separate the words separately to see the expected results in both scenarios:

 string first = "العسكري"; string second = "العسكرية"; Console.WriteLine(Regex.IsMatch(first, second)); // false Console.WriteLine(Regex.IsMatch(second, first)); // true 
+7
source

This is an interesting result of text conversion rules for prose, not code.

The first argument to the method call, as described above, is "العسكرية", the argument that is displayed (*) on the right side. This longer argument is the input, and the shorter substring displayed on the left is actually a pattern, hence a match.

(*: it is assumed that your browser knows how to render from right to left. If you paste a piece of code into an editor or console that does not support complex text layout support, you will see that it is actually ... although the Arabic language will be broken. )

The trick is that punctuation marks, such as quotation marks and commas, are aimless, so they can display from left to right or from right to left, depending on their environment. The logical order of the fragment:

 >>>>>>>>>>>>>>> <<<<<<<<<<<<<<<<<<< >> Regex.IsMatch("العسكرية", "العسكري") 

(which has another confusing property that the quotation marks that appear around each individual parameter are not really.)

This makes some controversial sense for stretch marks of a readable mixed language, but makes the code very confusing! You can stop this by breaking the mileage of aimless characters with something that has a focus from left to right:

 Regex.IsMatch("العسكرية", /* foo */ "العسكري") 

This is functionally the same code as the original, but it looks very different. You can view the position of the argument sections when entering the first Latin letter.

+2
source

It seems that Regex.IsMatch() indicates whether a regular expression appears in the string, and not that the entire string matches the regular expression (according to the docs, it Indicates whether the specified regular expression matches the specified input string. "). First. the argument is entered, the other is the template according to the documents, but here it looks the other way around. The last (leftmost) character looks like the other in two lines, but probably because of how the ligatures are rendered. When discarded as UTF-8 bytes line:

 d8 a7 d9 84 d8 b9 d8 b3 d9 83 d8 b1 d9 8a 

and

 d8 a7 d9 84 d8 b9 d8 b3 d9 83 d8 b1 d9 8a d8 a9 

therefore the first is actually a substring of the other that explains the correspondence (it is required that the order of the arguments is actually turned to what the documentation says).

+1
source

Source: https://habr.com/ru/post/1402538/


All Articles