You get 404 Not Found (OpenURI::HTTPError) , so if you want your code to continue, save for this exception. Something like this should work:
require 'nokogiri' require 'open-uri' URLS = %w[ http://www.moxyst.com/fashion/men-clothing/underwear.html ] URLs.each do |url| begin doc = Nokogiri::HTML(open(url)) rescue OpenURI::HTTPError => e puts "Can't access #{ url }" puts e.message puts next end puts doc.to_html end
You can use more general exceptions, but then you encounter problems getting strange output or you can handle an unrelated problem in such a way as to cause more problems, so you need to figure out the necessary granularity.
You can even sniff the HTTPd headers, the response status, or see the exception message if you want even more control and want to do something else for 401 or 404.
I can access this from a web browser, I just don't understand it.
Well, maybe something is happening on the server side: maybe they donβt like the UserAgent line you are sending? The OpenURI documentation shows how to change this header:
Additional header fields can be specified using an optional hash argument.
open("http://www.ruby-lang.org/en/", "User-Agent" => "Ruby/#{RUBY_VERSION}", "From" => " foo@bar.invalid ", "Referer" => "http://www.ruby-lang.org/") {|f|
source share