In some weird behavior with Loofah and Sanitize, trying to clear some html fragments when I noticed that times like “6:30 pm” turn into “30pm”.
Some investigation was carried out and the following was found:
Loofah.scrub_fragment("<span>asdfasdf 6:30 pm</span>", :strip).to_html #=> "<span>asdfasdf 30 pm</span>" Loofah.scrub_fragment("6:30 pm", :strip).to_html #=> "6:30 pm" Loofah.scrub_fragment("<foo>asdfasdf 6:30 pm</foo>", :strip).to_html #=> "asdfasdf 6:30 pm" Loofah.scrub_fragment("bar:30 pm", :strip).to_html #=> "bar:30 pm" Loofah.scrub_fragment("<span>bar:30 pm</span>", :strip).to_html #=> "<span>30 pm</span>" Loofah.scrub_fragment("<span>bar: asdfasdfadsf pm</span>", :strip).to_html #=> "<span>bar: asdfasdfadsf pm</span>"
This applies to all variants of Loofah (: prunes, etc.) and Sanitize, so I assume this is a common code issue for both. Is there anything special I need to do to avoid a colon in the code before disinfection?
Edit 1 I understand that I forgot to mention that I am using jruby (jruby 1.7.0 (1.9.3p203)). I'm trying to figure it out, maybe there may be a problem in nokogiri (which is the basis of both of these gems?)
Edit 2 With further digging, it seems that MAY be a problem in Nokigiri on Jruby (I'm on nokagiri version 1.5.5, for what it's worth). I checked the nokogiri fragment parser on Jruby and on Ruby 1.9.3:
Jruby 1.7.0: Unexpected Results
doc = Nokogiri::HTML.fragment("<span>3:30pm</span>") => #(DocumentFragment:0x5fbc { name = "#document-fragment", children = [ #(Element:0x5fc0 { name = "span", children = [ #(Text "30pm")] })] })
Ruby 1.9.3: Expected Results
doc = Nokogiri::HTML.fragment("<span>3:30pm</span>") => #(DocumentFragment:0x3fc4b102055c { name = "#document-fragment", children = [ #(Element:0x3fc4b101fff8 { name = "span", children = [ #(Text "3:30pm")] })] })
Will try to keep digging, but any suggestions are welcome.