Sinatra / Rack fails with non-ascii characters in url

I get the encoding :: UndefinedConversionError in / find / Wroc ław "\ xC5" from ASCII-8BIT to UTF-8

For some mysterious reason, the sinatra passes the string as ASCII instead of UTF-8, as it should be.

I found some ugly workaround ... I don't know why Rack assumes ASCII-8BIT encoding ... anyway, this is a way to use string.force_encoding ("UTF-8"). but doing this for all parameters is tedious

+3
source share
2 answers

I had some similar problems with routing to "/ protégés /: id". I am sent to the Rack mailing list, but the answer was small.

The solution I came up with is not perfect, but it works in most cases. First, create a middleware that decodes UTF-8:

# in lib/fix_unicode_urls_middleware.rb:
require 'cgi'
class FixUnicodeUrlsMiddleware
  ENVIRONMENT_VARIABLES_TO_FIX = [
    'PATH_INFO', 'REQUEST_PATH', 'REQUEST_URI'
  ]

  def initialize(app)
    @app = app
  end

  def call(env)
    ENVIRONMENT_VARIABLES_TO_FIX.each do |var|
      env[var] = CGI.unescape(env[var]) if env[var] =~ /%[A-Za-z0-9]/
    end
    @app.call(env)
  end
end 

Then use this middleware in config/environment.rb(Rails 2.3) or config/application.rb(Rails 3).

You also need to make sure that you set the correct HTTP encoding header:

Content-type: text/html; charset=utf-8

You can install this in Rails, in the rack or on your web server, depending on how many different encodings you use on your site.

+3
source

AFAIK UTF-8 URL-, % , , , . , Rack, , URL-. HTTP , .

RFC 3986

URI , , [UCS], UTF-8 [STD63]; , , . , A "A", LATIN CAPITAL LETTER A WITH GRAVE "% C3% 80", KATAKANA LETTER A "% E3% 82% A2".

+2

Source: https://habr.com/ru/post/1726367/


All Articles