I'm curious about the algorithm for determining which characters to include in a regular expression when using - ...
Example: [a-zA-Z0-9]
This matches any character in any case, from a to z and a number from 0 to 9.
I initially thought that they were used as macros, for example, az translates to a,b,c,d,e , etc., but after I saw the following in open source ,
text.tr('A-Za-z1-90', 'βΆ-ββ-β©β -β¨βͺ')
my regular expression paradigm has completely changed, because these are characters that are not your typical characters, as the devil did it right, I thought.
My theory is that - literally means
Any ASCII value between the left character and the right character. (e.g. az [97-122])
Can anyone confirm the correctness of my theory? Is the regex pattern actually computed using character codes between any character?
Also, if this is correct, you could execute a regex, like
Az
because A 65 , and z is 122 , so theoretically it should also match all the characters between these values.