If a Ruby regex matches with something that is not a string, the method to_stris called on this object to get the actual string for which it matches. I want to avoid this behavior; I would like to match regular expressions to objects that are not strings, but can be logically considered as randomly accessible sequences of bytes, and all calls to them are mediated using the method byte_at()(similar in the spirit of Java).
For example, suppose I want to find the byte offset in an arbitrary arbitrary regular expression file; the expression can be multi-line, so I can’t just read the line at a time and look for a match on each line. If the file is very large, I cannot fit it all into memory, so I cannot just read it as one big line. However, it would be simple enough to determine the method that receives the nth byte of the file (with buffering and caching, if necessary for speed).
In the end, I would like to create a fully functional rope class, such as Ruby Quiz # 137 , and I would like to be able to use regular expressions for them without losing performance when converting them to strings.
I don't want to climb my elbows in the inside of the Ruby regex, so any insight would be appreciated.
source
share