Sometimes I have evil non-printable characters in the middle of a line. These lines are user inputs, so I have to make my program well-received, and not try to change the source of the problem.
For example, they can have zero width without a space in the middle of the line. For example, when parsing a .po file, one problematic part was the line "he is a man of god" in the middle of the file. Although everything seems to be correct, checking it with irb shows:
"he is a man of god".codepoints => [104, 101, 32, 105, 115, 32, 97, 32, 65279, 109, 97, 110, 32, 111, 102, 32, 103, 111, 100]
I believe that I know what BOM , and I even do it very well. However, sometimes I have such characters in the middle of the file, so this is not a BOM .
My current approach is to remove all the characters that I found evil in a really smelly manner:
text = (text.codepoints - CODEPOINTS_BlACKLIST).pack("U*")
The closest I got was this post , which led me to the option :print: for regular expressions. However, for me it was not good:
"m".scan(/[[:print:]]/).join.codepoints => [65279, 109]
so the question is: How to remove all non-printable characters from a string in ruby?