I parse this channel http://www.sixapart.com/labs/update/developers/ with nokogiri and then run some regex for the contents of some tags. The contents of UTF-8 are mostly, but sometimes corrupt. However, for my case, I do not care, and I just need to convey the correct parts of the content, so I'm glad to treat the data as binary / ASCII -8BIT. The problem is that no matter what I do, the regular expressions in my script are treated as UTF-8 or ASCII. Regardless of what I set for the coding comment, or what I do to create a regular expression.
Is there a solution? Can I force a regex to binary? Can I make gsub without regex? (I just replace & c &)
source
share