How to parse and clear URL meta tags using Nokogiri?

I use Nokogiri to pull out the <h1> and <title> tags but I am having trouble getting them:

 <meta name="description" content="I design and develop websites and applications."> <meta name="keywords" content="web designer,web developer"> 

I have this code:

 url = 'https://en.wikipedia.org/wiki/Emma_Watson' page = Nokogiri::HTML(open(url)) puts page.css('title')[0].text puts page.css('h1')[0].text puts page.css('description') puts META DESCRIPTION puts META KEYWORDS 

I looked through the documents and found nothing. Can a regex be used for this?

Thanks.

+6
source share
3 answers

Here's how I would do it:

 require 'nokogiri' doc = Nokogiri::HTML(<<EOT) <meta name="description" content="I design and develop websites and applications."> <meta name="keywords" content="web designer,web developer"> EOT contents = %w[description keywords].map { |name| doc.at("meta[name='#{name}']")['content'] } contents # => ["I design and develop websites and applications.", "web designer,web developer"] 

Or:

 contents = doc.search("meta[name='description'], meta[name='keywords']").map { |n| n['content'] } contents # => ["I design and develop websites and applications.", "web designer,web developer"] 
+7
source

It will be:

 page.at('meta[name="keywords"]')['content'] 
+6
source

Another solution: you can use XPath or CSS.

 puts page.xpath('/html/head/meta[@name="description"]/@content').to_s puts page.xpath('/html/head/meta[@name="keywords"]/@content').to_s 
+1
source

Source: https://habr.com/ru/post/949966/


All Articles