The question Martijn is associated with has two best ways to do this, but Martijn made an understandable but incorrect change when copying the second approach to his answer here. Executing .encode ('UTF-8', <options>). Encode ('UTF-8') does not work. As indicated in the original answer in another question, the key must be encoded in a different encoding, and then back to UTF-8. If your source string is already marked as UTF-8 in the Ruby internals, then ruby will ignore any call to encode it as UTF-8.
In the following examples, I'm going to use "a # {0xFF.chr) b" .force_encoding ('UTF-8') to create a string that, in Ruby's opinion, is UTF-8 but contains invalid UTF-8 bytes.
1.9.3p194 :019 > "a#{0xFF.chr}b".force_encoding('UTF-8') => "a\xFFb" 1.9.3p194 :020 > "#{0xFF.chr}".force_encoding('UTF-8').encoding => #<Encoding:UTF-8>
Note how UTF-8 encoding does nothing:
1.9.3p194 :016 > "a#{0xFF.chr}b".force_encoding('UTF-8').encode('UTF-8', :invalid => :replace, :replace => '').encode('UTF-8') => "a\xFFb"
But encoding something else (UTF-16) and then back to UTF-8 clears the line:
1.9.3p194 :017 > "a#{0xFF.chr}b".force_encoding('UTF-8').encode('UTF-16', :invalid => :replace, :replace => '').encode('UTF-8') => "ab"
source share