The Ruby Nokogiri SAX parser truncates strings to & gt; (aka ">")
Background: I am using the Ruby Nokogiri gem to parse an XML file. The problem I am facing is that the SAX parser returns an incomplete result when the string contains >, which is the HTML encoding for >. For instance:
<element>PART1PART2</element> #=> returns "PART1PART2"
<element>PART3>PART4</element> #=> returns "PART3"
My parser is as follows:
require 'nokogiri'
class MySample < Nokogiri::XML::SAX::Document
def characters(string)
puts string
end
end
# Create a new parser
parser = Nokogiri::XML::SAX::Parser.new(MySample.new)
# Feed the parser some XML
parser.parse_file(ARGV[0])
: >, , . > XML. XML , , > . , Nokogiri HTML ( > >) .
: Nokogiri HTML > , ?
1- (FWIW)
, , . , , . , , SAX, DOM-.
:
Nokogiri v1.6.1. ( ) - v1.6.6, .
(. matt ), , (,
>,>..).Ruby Ox , , Nokogiri. , ,
>. , ,>. , Nokogiri ( ).
:
Nokogiri, Ox . , ( , ). , Ox , > / >.