How to search for XML while parsing it using SAX in nokogiri

I have a simple but huge xml file as shown below. I want to parse it using SAX and print the text between the title tag.

 <root> <site>some site</site> <title>good title</title> </root> 

I have the following code:

 require 'rubygems' require 'nokogiri' include Nokogiri class PostCallbacks < XML::SAX::Document def start_element(element, attributes) if element == 'title' puts "found title" end end def characters(text) puts text end end parser = XML::SAX::Parser.new(PostCallbacks.new) parser.parse_file("myfile.xml") 
Problem

is that it prints text between all tags. How can I just print the text between the title tag?

+4
source share
2 answers

You just need to keep track of when you are inside the <title> , so that the characters know when to pay attention. Maybe something like this (unverified code):

 class PostCallbacks < XML::SAX::Document def initialize @in_title = false end def start_element(element, attributes) if element == 'title' puts "found title" @in_title = true end end def end_element(element) # Doesn't really matter what element we're closing unless there is nesting, # then you'd want "@in_title = false if element == 'title'" @in_title = false end def characters(text) puts text if @in_title end end 
+8
source

The accepted answer above is correct, however, it has the disadvantage that it will go through the whole XML file, even if it finds a <title> at the beginning.

I had similar needs, and I ended up writing a saxy ruby stone that is designed to be effective in such situations. Under the hood, it implements the Nokogiri SAX Api.

Here's how you use it:

 require 'saxy' title = Saxy.parse(path_to_your_file, 'title').first 

It will stop when it detects the first occurrence of the <title> .

+1
source

Source: https://habr.com/ru/post/1333439/


All Articles