Saving all image files from a website

Question

Saving all image files from a website

I create a small application for myself where I run a Ruby script and save all the images outside my blog.

I cannot figure out how to save image files after I have identified them. Any help would be greatly appreciated.

require 'rubygems' require 'nokogiri' require 'open-uri' url = '[my blog url]' doc = Nokogiri::HTML(open(url)) doc.css("img").each do |item| #something end

+6

ruby web-crawler screen-scraping nokogiri

Zack shapiro Oct 28 '11 at 8:10

source share

4 answers

Assuming the src attribute is an absolute url, maybe something like:

 if item['src'] =~ /([^\/]+)$/ File.open($1, 'wb') {|f| f.write(open(item['src']).read)} end

+1

pguardiario Oct 28 '11 at 9:39

source share

Tip: There is an easy way to get images from the head / body of the page using the Scrapifier gem. The best part is that you can also determine what type of image you want to return (jpg, png, gif).

Try: https://github.com/tiagopog/scrapifier

I hope you will like it.

+1

Tiago G. Apr 9 '14 at 15:05

source share

 system %x{ wget #{item['src']} }

Edit: It is assumed that you are on a unix system with wget :) Edit 2: Updated code to capture img src from nokogiri.

-1

Steven jackson Oct 28 '11 at 8:17

source share

Phrogz · Accepted Answer · 2011-10-28T18:57:14+0000

 URL = '[my blog url]' require 'nokogiri' # gem install nokogiri require 'open-uri' # already part of your ruby install Nokogiri::HTML(open(URL)).xpath("//img/@src").each do |src| uri = URI.join( URL, src ).to_s # make absolute uri File.open(File.basename(uri),'wb'){ |f| f.write(open(uri).read) } end

Using code to convert to absolute paths from here: How to get an absolute URL when retrieving links using Nokogiri?

Saving all image files from a website

More articles: