Saving all image files from a website

I create a small application for myself where I run a Ruby script and save all the images outside my blog.

I cannot figure out how to save image files after I have identified them. Any help would be greatly appreciated.

require 'rubygems' require 'nokogiri' require 'open-uri' url = '[my blog url]' doc = Nokogiri::HTML(open(url)) doc.css("img").each do |item| #something end 
+6
source share
4 answers
 URL = '[my blog url]' require 'nokogiri' # gem install nokogiri require 'open-uri' # already part of your ruby install Nokogiri::HTML(open(URL)).xpath("//img/@src").each do |src| uri = URI.join( URL, src ).to_s # make absolute uri File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) } end 

Using code to convert to absolute paths from here: How to get an absolute URL when retrieving links using Nokogiri?

+25
source

Assuming the src attribute is an absolute url, maybe something like:

 if item['src'] =~ /([^\/]+)$/ File.open($1, 'wb') {|f| f.write(open(item['src']).read)} end 
+1
source

Tip: There is an easy way to get images from the head / body of the page using the Scrapifier gem. The best part is that you can also determine what type of image you want to return (jpg, png, gif).

Try: https://github.com/tiagopog/scrapifier

I hope you will like it.

+1
source
 system %x{ wget #{item['src']} } 

Edit: It is assumed that you are on a unix system with wget :) Edit 2: Updated code to capture img src from nokogiri.

-1
source

Source: https://habr.com/ru/post/900268/


All Articles