Hex regex characters in mysql

I discovered a very strange mysql behavior. Selecting below returns 0:

SELECT CONVERT('a' USING BINARY) REGEXP '[\x61]' 

However, a semantically identical select below returns 1:

 SELECT CONVERT('a' USING BINARY) REGEXP '[\x61-\x61]' 

Do you know what is going on here? I tested this in mysql 5.0.0.3031 and 4.1.22

I need hexadecimal characters to create a regular expression that matches when a binary string is encoded in utf8. The perl version of such regexp can be found on the w3c site . It looks like this:

 $field =~ m/\A( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*\z/x; 
+4
source share
3 answers

This also matches:

 SELECT CONVERT('a' USING BINARY) REGEXP '[1-\x]' 

The reason is that \x interpreted as x and a is between 1 and x . The rest of your regular expression is just ordinary characters that are not relevant here because they are already in the [1-x] range.

 SELECT CONVERT('0' USING BINARY) REGEXP '[\x61-\x61]' -- Fails, because 0 < 1. SELECT CONVERT('1' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x]. SELECT CONVERT('2' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x]. ... SELECT CONVERT('w' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x]. SELECT CONVERT('x' USING BINARY) REGEXP '[\x61-\x61]' -- Succeeds: inside [1-x]. SELECT CONVERT('y' USING BINARY) REGEXP '[\x61-\x61]' -- Fails, because y > x. 

I'm not sure what you are trying to achieve, but if you need hexadecimal characters, you can use the hexadecimal function:

 SELECT HEX('a') 61 
+2
source

to write a regular expression like [\x61-\x65] in mysql, you can use the hexadecimal values ​​inside concat:

 SELECT CONVERT('a' USING BINARY) REGEXP CONCAT('[', 0x61, '-', 0x65, ']') 
+3
source

Lol ... based on the foregoing, you can just use print characters. It worked for me. I wanted it to match characters, not the US keyboard, and the following expression works on MySQL 5.1:

 [^ -~] 

This will do the same as

 [^\x20-\x7E] 
+2
source

Source: https://habr.com/ru/post/1300285/


All Articles