I am parsing my nginx logs and I want to find out some details from the HTTP_REFERER line, for example the query string used to search for a website. One user typed "MΓ©xico", which is encoded in the log as "query = M% E9xico".
Going through this Rack::Utils.parse_query('query=M%E9xico'), you get a hash,{"query" => "M?xico"}
When you add "M? Exico" to Postgres (but no more forgiving SQLite), it gets confused because the string doesn't match UTF-8. Looking at http://rack.rubyforge.org/doc/Rack/Utils.html#M000324 , unescape packs a hexadecimal string.
How to convert a string back to UTF-8, or I can make parse_query return UTF-8 in the first place.
source
share