Fast ruby ​​http library for large XML downloads

I use various XML-over-HTTP web services that return large XML files (> 2 MB). What would be the fastest ruby ​​http library to reduce load time?

Required Features:

  • and GET and POST requests

  • gzip / deflate downloads ( Accept-Encoding: deflate, gzip) - very important

I think between:

  • with open URI

  • Net :: HTTP

  • Curb

but you can also come with other offers.

PS To analyze the answer, I use the pull analyzer from Nokogiri, so I do not need an integrated solution such as rest-client or hpricot.

+3
source share
4 answers

http://github.com/pauldix/typhoeus

. libcurl, .

, Net:: HTTP , , - .

+3

EventMachine em-http XML:

require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'nokogiri'

# this is your SAX handler, I'm not very familiar with
# Nokogiri, so I just took an exaple from the RDoc
class SteamingDocument < Nokogiri::XML::SAX::Document
  def start_element(name, attrs=[])
    puts "starting: #{name}"
  end

  def end_element(name)
    puts "ending: #{name}"
  end
end

document = SteamingDocument.new
url = 'http://stackoverflow.com/feeds/question/2833829'

# run the EventMachine reactor, this call will block until 
# EventMachine.stop is called
EventMachine.run do
  # Nokogiri wants an IO to read from, so create a pipe that it
  # can read from, and we can write to
  io_read, io_write = IO.pipe

  # run the parser in its own thread so that it can block while
  # reading from the pipe
  EventMachine.defer(proc {
    parser = Nokogiri::XML::SAX::Parser.new(document)
    parser.parse_io(io_read)
  })

  # use em-http to stream the XML document, feeding the pipe with
  # each chunk as it becomes available
  http = EventMachine::HttpRequest.new(url).get
  http.stream { |chunk| io_write << chunk }

  # when the HTTP request is done, stop EventMachine
  http.callback { EventMachine.stop }
end

, , , . , , ( , , ).

+17

, . , , , , ; http.callback. !

require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'nokogiri'

# this is your SAX handler, I'm not very familiar with
# Nokogiri, so I just took an exaple from the RDoc
class SteamingDocument < Nokogiri::XML::SAX::Document
  def start_element(name, attrs=[])
    puts "starting: #{name}"
  end

  def end_element(name)
    puts "ending: #{name}"
  end

  def end_document
    puts "should now fire"
  end
end

document = SteamingDocument.new
url = 'http://stackoverflow.com/feeds/question/2833829'

# run the EventMachine reactor, this call will block until 
# EventMachine.stop is called
EventMachine.run do
  # Nokogiri wants an IO to read from, so create a pipe that it
  # can read from, and we can write to
  io_read, io_write = IO.pipe

  # run the parser in its own thread so that it can block while
  # reading from the pipe
  EventMachine.defer(proc {
    parser = Nokogiri::XML::SAX::Parser.new(document)
    parser.parse_io(io_read)
  }, proc { EventMachine.stop })

  # use em-http to stream the XML document, feeding the pipe with
  # each chunk as it becomes available
  http = EventMachine::HttpRequest.new(url).get
  http.stream { |chunk| io_write << chunk }

  # when the HTTP request is done, stop EventMachine
  http.callback { io_write.close }
end
+4
source

The fastest download is probably #read on an I / O object, which splits everything on one line. After that you can apply your processing. Or do you need the file to be processed at boot time?

+1
source

Source: https://habr.com/ru/post/1745440/


All Articles