Unicode character match in zsh regex

I want to make sure that the variable does not contain a specific character (in this case, "α"), but the following code does not work (returns 1):

FOO="test" && [[ $FOO =~ '^[^α]*$' ]] 

Edit: The template based on the feedback from stema below has been changed to require matching only "non-" characters from beginning to end.

The substitution 'α', for example. "x" works as expected. Why does this happen with "α", and how can I do this?

System Information:

 $ zsh --version zsh 4.3.11 (i386-apple-darwin11.0) $ locale LANG="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_CTYPE="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_ALL="en_GB.UTF-8" 

Edit 2: now I tested on a Linux machine running Ubuntu 11.10 with zsh 4.3.11 with the same locale settings, and it works there - i.e. FOO="test" && [[ $FOO =~ '^[^α]*$' ]] returns success. I am running Mac OS X 10.7.2.

+4
source share
3 answers

with this regular expression .*[^α].* you cannot verify that α not in the string. What it is: is there ONE character in a string that is not α .

If you want to check if this character is in the string, do this

 FOO="test" && [[ $FOO =~ '^[^α]*$' ]] 

this will check if the complete line from beginning to end consists of the characters "α".

+1
source

The easiest way to express this is with a negative outlook on the future at the beginning:

 ^(?!.*α) 

This suggests that “I look forward to it from the very beginning, I could not see α anywhere.

The advantage of using look-head is that they are not captured, so you can combine them with other exciting regular expressions, for example, to find groups of numbers in input quotes that do not contain α , use this: ^(?!.*α)"(\d+)"

0
source

For some reason, I encountered a similar problem in my build system, having ZSH version 5.0.2 on my laptop (where Unicode works, as expected) and ZSH 4.3.17 on my build system. It seems to me that ZSH 5 has no problem with Unicode characters in regex patterns.

In particular, parsing a key / value pair:

 [[ "revision/author=Ľudovít Lučenič" =~ '^([^=]+)=(.*)$' ]] echo "$match[1]:$match[2]" 

is having

 : # ZSH 4.3.17 revision/author:Ľudovít Lučenič # ZSH 5.0.2 

In addition, I assume some flaw in the support of ZSH 4 Unicode in general.

Update: after some research, I found that the dot in the regular expression does not match the letter “č” in ZSH 4. As soon as I updated the template to:

 [[ "revision/author=Ľudovít Lučenič" =~ '^([^=]+)=((.|č)*)$' ]] echo "$match[1]:$match[2]" 

I get the same result in both versions of ZSH. However, I do not know why this particular letter is the problem here. However, it can help someone solve this shortcoming.

0
source

Source: https://habr.com/ru/post/1386838/


All Articles