Regex template using w. * Does not match text starting with foreign characters, such as Ä

I have the following regex that I have successfully used:

preg_match_all('/(\d+)\n(\w.*)\n(\d{3}\.\d{3}\.\d{2})\n(\d.*)\n(\d.*)/', $text, $matches) 

However, I just discovered that if the text matching the part (\w.*) Starts with an external character such as Ä , then it does not match anything.

Can someone help me with what should have the correct pattern instead of (\w.*) To match a line starting with any character?

Many thanks

+6
source share
4 answers

If you want to combine umlauts, add the regex /u modifier or use \pL instead of \w . This will allow the regular expression to match letters outside the ASCII range.

Help: http://www.regular-expressions.info/unicode.html
and http://php.net/manual/en/regexp.reference.unicode.php

+9
source

Ä is German Umlaut, if I am not mistaken. \w Matches (in most [a-zA-Z0-9_] ) [a-zA-Z0-9_] .

You will need to match the unicode character range you want.

\x{00C4} (php) is equal to the character you want. You probably need to create a character class to support your Unicode characters.

+3
source

you may need to switch to using Unicode characters ...

as for ascii you would use [\ u0021- \ u007e] In this case ... maybe [\ u0021- \ u007e \ u0192- \ u687]

I'm not quite sure which character range you need, but \ w I think this only matches the normal asci range

0
source

Consider using:

 /(\d+)\n((\p{L}|\p{N}|_).*)\n(\d{3}\.\d{3}\.\d{2})\n(\d.*)\n(\d.*)/ 
0
source

Source: https://habr.com/ru/post/901513/


All Articles