Your safe_str method will (currently) never do anything with a string, it's non-op. The docs for String#encode in Ruby 1.9.3 say :
Note that converting from enc enc enc enc enc to the same enc enc enc enc is not an operation, that is, the recipient returns without any changes, and no exceptions occur even if there are invalid bytes.
This is true for the current version 2.0.0 (patch level 247), however a recent commit for the Ruby tranche modifies this and also introduces the scrub method, which pretty much does what you want.
Until a new version of Ruby is released, you will need to round your text string to a different encoding and return to clear it, as in the second example in this answer to the question you are related to , something like:
def safe_str str s = str.encode('utf-16', 'utf-8', invalid: :replace, undef: :replace, replace: '') s.encode!('utf-8', 'utf-16') end
Please note that your first example trying to create an invalid string does not work:
bad_str = (100..1000).to_a.inject('') {|s,c| s << c; s} bad_str.valid_encoding? # => true
From << docs :
If the object is an Integer, it is considered a code point and converted to a character before concatenation.
This way you always get a valid string.
The second method, using pack , will create an ASCII-8BIT . If you then change this using force_encoding , you can create a UTF-8 string with invalid encoding:
bad_str = (100..1000).to_a.pack('c*').force_encoding('utf-8') bad_str.valid_encoding? # => false