Incompatible character encoding in a simple Sinatra application

I have a very simple Sinatra application running on Ruby 1.9.3 that uses ERB and markdown templates. I stripped it to demonstrate the problem.

This runs Sinatra 1.3.2 on Mac OS X Snow Leopard. For markdown I use rdiscount 1.6.8.

The main Ruby file contains

get '/services' do erb :services end 

In the services.erb file there is

 <%= markdown :'content/service1' %> Β£ 

Inside the markdown file I have only one line

 Β£ 

When I launch the Sinatra application and load the "services" page, I get an Encoding::CompatibilityError at /services incompatible character encodings: UTF-8 and ASCII-8BIT in the second line of the ERB file (the one that contains only "Β£").

I have done a lot of googling, and I can’t understand for my whole life why this is happening. The ERB and markdown files are UTF-8 on my local drive, but obviously they are loaded by Sinatra and converted to strings, and I don’t know how to say what the encoding of these strings is.

If I force Sinatra to use ASCII-8BIT (by adding settings.default_encoding = 'ASCII-8BIT' to the top of my main Sinatra Ruby file), then the exception will not be thrown, but the characters 'Β£' look wrong.

Any pointers?

+6
source share
1 answer

This is a problem in Tilt , the template system that Sinatra uses (and is considered for Rails). Look at issues # 75 and # 107 .

The problem basically comes down to how Tilt reads template files from disk - it uses binread . This means that the original string passed to the actual template engine has an ASCII-8BIT associated encoding, which basically indicates that it is unknown.

RDiscount has a code to set the output encoding according to the input , but this does not help much when the input encoding is ASCII-8BIT ; the result is the same encoding. The same thing (or something similar) happens to Kramdown, so just switching will not work out.

This causes problems when the pattern has non-ascii characters (i.e. Β£ ) and you are trying to combine the result with other utf-8 encoded strings. If the template contains only ascii characters, it is compatible with utf-8, and Ruby can combine two lines. If not, you will get the CompatibilityError that you see.

A possible workaround is to read the template files themselves and transfer to the received string with the correct encoding in Tilt:

 <%= markdown File.read './views/pound.md' %> Β£ 

binread reading the read file yourself instead of binread , you can make sure that it has the correct encoding and is therefore compatible with the rest of the erb file. You might want to read the file at a time and cache the contents somewhere if you try this.

An alternative solution would be to capture the output of the markdown method and use force_encoding on it:

 <%= markdown(:pound).force_encoding('utf-8') %> Β£ 

This is possible because although the encoding is ASCII-8BIT , you know that the bytes in the string are actually encoded in utf-8, so you can just change the encoding.

+14
source

Source: https://habr.com/ru/post/914272/


All Articles