Summary Problem symbols are control characters from ISO-8859-1 that are not intended to be displayed.
Details and investigation :
Here is a test showing that you are getting a valid UTF-8 from Nokogiri and Sinatra:
require 'sinatra' require 'open-uri' get '/' do html = open("http://flybynight.com.br/agenda.php").read p [ html.encoding, html.valid_encoding? ]
This fills the content correctly using Content-Type:text/html;charset=utf-8 on my computer. However, Chrome does not show my this character in the browser.
Parsing this answer, the same pair of Unicode bytes returns to the dash, as shown above: \xC2\x96 . It seems to be this Unicode character that seems like a weird dash.
I could write this down to bad source data and just quit:
at the top of your Ruby source file (s), and then type:
f = ...text.gsub( "\xC2\x96", "-" )
Change If you look at the browser check page for this character , you will see (at least in Chrome and Firefox) that the literal version of UTF-8 is empty, but versions with hexadecimal and decimal escape versions appear. I canβt understand why this is so, but you have it. Browsers simply do not display your character correctly if presented in raw form.
Either make it an HTML object, or another Unicode type. In any case, gsub is called.
Change # 2 . Another odd note: the character in the source encoding has a hexadecimal byte value of 0x96 . As far as I can tell, this is not like the printed character ISO-8859-1 . As shown in the official specification for ISO-8859-1 , this refers to one of two areas without printing. A.