Ruby Mechanize 404 => Net :: HTTPNotFound

I have a URL with which I cannot access using Mechanize, and I don't know why:

# Use ruby 2.1.6
require 'mechanize'
require 'axlsx' # 2.0.1
require 'roo' # 1.13.2

mechanize = Mechanize.new
mechanize.request_headers = { "Accept-Encoding" => "" }
mechanize.ignore_bad_chunking = true
mechanize.follow_meta_refresh = true

xlsx = Roo::Excelx.new("./base_list.xlsx")

xlsx.each_with_pagename do |page, sheet|
  sheet.each do |row|
    page = mechanize.get(row[0])
  end
end

When I repeat in my list, I get URLs like: https://angel.co/_helencousins , I can access it in my browser, but not using Mechanize, and I have this error:

/.rvm/gems/ruby-2.1.6/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:316:in `fetch': 404 => Net::HTTPNotFound for https://angel.co/_helencousins -- unhandled response (Mechanize::ResponseCodeError)
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get'
    from scraper.rb:15:in `block (2 levels) in <main>'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:428:in `block in each'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:427:in `upto'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:427:in `each'
    from scraper.rb:14:in `block in <main>'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:398:in `block in each_with_pagename'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:397:in `each'
    from /Users/xxx/.rvm/gems/ruby-2.1.6/gems/roo-1.13.2/lib/roo/base.rb:397:in `each_with_pagename'
    from scraper.rb:13:in `<main>'
+4
source share
1 answer

Good,

The problem was that the website had disabled the Mechanize user agent.

I just changed it to: mechanize.user_agent_alias = 'Windows Chrome'

+3
source

Source: https://habr.com/ru/post/1623964/


All Articles