Matching regular expressions with non-strings in Ruby without conversion

If a Ruby regex matches with something that is not a string, the method to_stris called on this object to get the actual string for which it matches. I want to avoid this behavior; I would like to match regular expressions to objects that are not strings, but can be logically considered as randomly accessible sequences of bytes, and all calls to them are mediated using the method byte_at()(similar in the spirit of Java).

For example, suppose I want to find the byte offset in an arbitrary arbitrary regular expression file; the expression can be multi-line, so I can’t just read the line at a time and look for a match on each line. If the file is very large, I cannot fit it all into memory, so I cannot just read it as one big line. However, it would be simple enough to determine the method that receives the nth byte of the file (with buffering and caching, if necessary for speed).

In the end, I would like to create a fully functional rope class, such as Ruby Quiz # 137 , and I would like to be able to use regular expressions for them without losing performance when converting them to strings.

I don't want to climb my elbows in the inside of the Ruby regex, so any insight would be appreciated.

+3
source share
1 answer

You can not. This was not supported in Ruby 1.8.x, probably because it is such an extreme case; and in 1.9 it doesn't even make sense. Ruby 1.9 does not bind its lines to bytes in any way convenient for the user; instead, it uses character codes so that it can support the many encodings that it accepts. And the 1.9 new optimized regular expression engine, Oniguruma, is also built around the same concept of encodings and code points. Bytes simply do not enter the image at this level.

, , , - . Ruby to_str . , Ruby, , , .

Ruby - grep - Unix. Ruby, .

+3

Source: https://habr.com/ru/post/1721044/


All Articles