How to parse and clear URL meta tags using Nokogiri?

Question

How to parse and clear URL meta tags using Nokogiri?

I use Nokogiri to pull out the <h1> and <title> tags but I am having trouble getting them:

 <meta name="description" content="I design and develop websites and applications."> <meta name="keywords" content="web designer,web developer">

I have this code:

 url = 'https://en.wikipedia.org/wiki/Emma_Watson' page = Nokogiri::HTML(open(url)) puts page.css('title')[0].text puts page.css('h1')[0].text puts page.css('description') puts META DESCRIPTION puts META KEYWORDS

I looked through the documents and found nothing. Can a regex be used for this?

Thanks.

+6

ruby html-parsing nokogiri

crentist Jul 22 '13 at 7:26

source share

3 answers

It will be:

 page.at('meta[name="keywords"]')['content']

+6

pguardiario Jul 22 '13 at 8:10

source share

Another solution: you can use XPath or CSS.

 puts page.xpath('/html/head/meta[@name="description"]/@content').to_s puts page.xpath('/html/head/meta[@name="keywords"]/@content').to_s

+1

DriftwoodJP Jan 16 '18 at 8:13

source share

the tin man · Accepted Answer · 2013-07-22T19:05:45+0000

Here's how I would do it:

 require 'nokogiri' doc = Nokogiri::HTML(<<EOT) <meta name="description" content="I design and develop websites and applications."> <meta name="keywords" content="web designer,web developer"> EOT contents = %w[description keywords].map { |name| doc.at("meta[name='#{name}']")['content'] } contents # => ["I design and develop websites and applications.", "web designer,web developer"]

Or:

 contents = doc.search("meta[name='description'], meta[name='keywords']").map { |n| n['content'] } contents # => ["I design and develop websites and applications.", "web designer,web developer"]

How to parse and clear URL meta tags using Nokogiri?

More articles: