Getting headers and meta tags from an external site

Question

Getting headers and meta tags from an external site

I want to try to figure out how to get

<title>A common title</title> <meta name="keywords" content="Keywords blabla" /> <meta name="description" content="This is the description" />

Despite the fact that if it is ordered in any order, I heard about PHP Simple HTML DOM Parser, but I really do not want to use it. Is there a solution, other than using PHP Simple HTML DOM Parser.

preg_match fail to do this if it is invalid HTML?

Can cURL do something similar with preg_match?

Facebook does something like this, but it is correctly used with:

 <meta property="og:description" content="Description blabla" />

I want something like this, so if someone places a link, he should get the name and meta tags. If the meta tag does not exist, then it is ignored or the user can install it independently (but I will do it later on myself).

+49

php curl meta-tags title

MacMac Sep 14 '10 at 17:27

source share

20 answers

 <?php // Assuming the above tags are at www.example.com $tags = get_meta_tags('http://www.example.com/'); // Notice how the keys are all lowercase now, and // how . was replaced by _ in the key. echo $tags['author']; // name echo $tags['keywords']; // php documentation echo $tags['description']; // a php manual echo $tags['geo_position']; // 49.33;-86.59 ?>

+32

Bob Jeey Mar 15 2018-12-12T00:

source share

get_meta_tags will help you with everything except the name. To get the name, just use a regex.

 $url = 'http://some.url.com'; preg_match("/<title>(.+)<\/title>/siU", file_get_contents($url), $matches); $title = $matches[1];

Hope this helps.

+7

Lloyd Moore Jan 09 '11 at 17:34

source share

Php native function: get_meta_tags ()

http://php.net/manual/en/function.get-meta-tags.php

+6

Nitroware Dec 19 2018-11-11T00:

source share

It’s best to bite a bullet using the DOM Parser - this is the “right way” to do this. This ultimately saves you more time than you need to know how to do it. Regular expression HTML analysis is known to be unreliable and intolerant of special cases.

+4

Joshua Sep 14 '10 at 17:31

source share

get_meta_tags did not work with the header.

Only meta tags with name attributes such as

 <meta name="description" content="the description">

will be analyzed.

+4

Harald May 05 '12 at 13:51

source share

http://php.net/manual/en/function.get-meta-tags.php

 <?php // Assuming the above tags are at www.example.com $tags = get_meta_tags('http://www.example.com/'); // Notice how the keys are all lowercase now, and // how . was replaced by _ in the key. echo $tags['author']; // name echo $tags['keywords']; // php documentation echo $tags['description']; // a php manual echo $tags['geo_position']; // 49.33;-86.59 ?>

+3

afro360 Oct 23 '15 at 8:36

source share

Unfortunately, the php get_meta_tags () built-in function requires a name parameter, and some sites, such as twitter, leave this in favor of the property attribute. This function, using a combination of a regular expression document and dom, will return an array of metadata from the web page. It checks the name parameter, then the property parameter. This has been tested on instragram, pinterest and twitter.

 /** * Extract metatags from a webpage */ function extract_tags_from_url($url) { $tags = array(); $ch = curl_init(); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $contents = curl_exec($ch); curl_close($ch); if (empty($contents)) { return $tags; } if (preg_match_all('/<meta([^>]+)content="([^>]+)>/', $contents, $matches)) { $doc = new DOMDocument(); $doc->loadHTML('<?xml encoding="utf-8" ?>' . implode($matches[0])); $tags = array(); foreach($doc->getElementsByTagName('meta') as $metaTag) { if($metaTag->getAttribute('name') != "") { $tags[$metaTag->getAttribute('name')] = $metaTag->getAttribute('content'); } elseif ($metaTag->getAttribute('property') != "") { $tags[$metaTag->getAttribute('property')] = $metaTag->getAttribute('content'); } } } return $tags; }

+3

oknate Oct 27 '15 at 19:25

source share

We use Apache Tika via php (command line utility) with -j for json:

http://tika.apache.org/

 <?php shell_exec( 'java -jar tika-app-1.4.jar -j http://www.guardian.co.uk/politics/2013/jul/21/tory-strategist-lynton-crosby-lobbying' ); ?>

This is an example of the output from a guardian's free article:

 { "Content-Encoding":"UTF-8", "Content-Length":205599, "Content-Type":"text/html; charset\u003dUTF-8", "DC.date.issued":"2013-07-21", "X-UA-Compatible":"IE\u003dEdge,chrome\u003d1", "application-name":"The Guardian", "article:author":"http://www.guardian.co.uk/profile/nicholaswatt", "article:modified_time":"2013-07-21T22:42:21+01:00", "article:published_time":"2013-07-21T22:00:03+01:00", "article:section":"Politics", "article:tag":[ "Lynton Crosby", "Health policy", "NHS", "Health", "Healthcare industry", "Society", "Public services policy", "Lobbying", "Conservatives", "David Cameron", "Politics", "UK news", "Business" ], "content-id":"/politics/2013/jul/21/tory-strategist-lynton-crosby-lobbying", "dc:title":"Tory strategist Lynton Crosby in new lobbying row | Politics | The Guardian", "description":"Exclusive: Firm he founded, Crosby Textor, advised private healthcare providers how to exploit NHS \u0027failings\u0027", "fb:app_id":180444840287, "keywords":"Lynton Crosby,Health policy,NHS,Health,Healthcare industry,Society,Public services policy,Lobbying,Conservatives,David Cameron,Politics,UK news,Business,Politics", "msapplication-TileColor":"#004983", "msapplication-TileImage":"http://static.guim.co.uk/static/a314d63c616d4a06f5ec28ab4fa878a11a692a2a/common/images/favicons/windows_tile_144_b.png", "news_keywords":"Lynton Crosby,Health policy,NHS,Health,Healthcare industry,Society,Public services policy,Lobbying,Conservatives,David Cameron,Politics,UK news,Business,Politics", "og:description":"Exclusive: Firm he founded, Crosby Textor, advised private healthcare providers how to exploit NHS \u0027failings\u0027", "og:image":"https://static-secure.guim.co.uk/sys-images/Guardian/Pix/pixies/2013/7/21/1374433351329/Lynton-Crosby-008.jpg", "og:site_name":"the Guardian", "og:title":"Tory strategist Lynton Crosby in new lobbying row", "og:type":"article", "og:url":"http://www.guardian.co.uk/politics/2013/jul/21/tory-strategist-lynton-crosby-lobbying", "resourceName":"tory-strategist-lynton-crosby-lobbying", "title":"Tory strategist Lynton Crosby in new lobbying row | Politics | The Guardian", "twitter:app:id:googleplay":"com.guardian", "twitter:app:id:iphone":409128287, "twitter:app:name:googleplay":"The Guardian", "twitter:app:name:iphone":"The Guardian", "twitter:app:url:googleplay":"guardian://www.guardian.co.uk/politics/2013/jul/21/tory-strategist-lynton-crosby-lobbying", "twitter:card":"summary_large_image", "twitter:site":"@guardian" }

+2

sebilasse Jul 21

source share

Get meta tags from url, php function example:

 function get_meta_tags ($url){ $html = load_content ($url,false,""); print_r ($html); preg_match_all ("/<title>(.*)<\/title>/", $html["content"], $title); preg_match_all ("/<meta name=\"description\" content=\"(.*)\"\/>/i", $html["content"], $description); preg_match_all ("/<meta name=\"keywords\" content=\"(.*)\"\/>/i", $html["content"], $keywords); $res["content"] = @array("title" => $title[1][0], "descritpion" => $description[1][0], "keywords" => $keywords[1][0]); $res["msg"] = $html["msg"]; return $res; }

Example:

 print_r (get_meta_tags ("bing.com") );

Get php meta tags

+1

x3m-bymer Sep 06 '12 at 16:45

source share

Built-in Easy and php function.

http://php.net/manual/en/function.get-meta-tags.php

+1

Jay Dave Jan 07 '13 at 7:53 on

source share

 <?php // ------------------------------------------------------ function curl_get_contents($url) { $timeout = 5; $useragent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $useragent); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $data = curl_exec($ch); curl_close($ch); return $data; } // ------------------------------------------------------ function fetch_meta_tags($url) { $html = curl_get_contents($url); $mdata = array(); $doc = new DOMDocument(); $doc->loadHTML($html); $titlenode = $doc->getElementsByTagName('title'); $title = $titlenode->item(0)->nodeValue; $metanodes = $doc->getElementsByTagName('meta'); foreach($metanodes as $node) { $key = $node->getAttribute('name'); $val = $node->getAttribute('content'); if (!empty($key)) { $mdata[$key] = $val; } } $res = array($url, $title, $mdata); return $res; } // ------------------------------------------------------ ?>

+1

sbmark Feb 28 '14 at 18:11

source share

Currently, most sites add meta tags to their sites, providing information about their site or on any particular page of the article. Such as news or blog sites.

I created a Meta API that gives you the necessary ac metadata like OpenGraph, Schema.Org, etc.

Check it out - https://api.sakiv.com/docs

+1

sakiv Mar 19 '17 at 19:53 on

source share

If you are working with PHP, check out the Pear packages at pear.php.net and see if you find anything useful for you. I have effectively used RSS packages, and it saves a lot of time if you can follow how they implement their code through their examples.

In particular, check out Sax 3 and see if it works for your needs. Sax 3 is no longer being updated, but that might be enough.

0

Geekster Sep 14 '10 at 18:07

source share

As already mentioned, this can solve the problem:

 $url='http://stackoverflow.com/questions/3711357/get-title-and-meta-tags-of-external-site/4640613'; $meta=get_meta_tags($url); echo $title=$meta['title']; //php - Get Title and Meta Tags of External site - Stack Overflow

0

Roger Apr 29 '12 at 20:25

source share

I made this small compositional package based on the top answer: https://github.com/diversen/get-meta-tags

 composer require diversen/get-meta-tags

And then:

 use diversen\meta; $m = new meta(); // Simple usage, get title, description, and keywords by default $ary = $m->getMeta('https://github.com/diversen/get-meta-tags'); print_r($ary); // With more params $ary = $m->getMeta('https://github.com/diversen/get-meta-tags', array ('description' ,'keywords'), $timeout = 10); print_r($ary);

The top answer requires CURL and DOMDocument, and it is built on this path, but it has the ability to set a timeout for hangs (and to get all kinds of meta tags).

0

dennis Nov 07 '16 at 17:48

source share

It works differently for me, and I decided to share it. Less code than others and found it here . I added a few things to make it load the meta page you're on, not a specific page. I wanted this to automatically copy the title and description of the default page into og tags.

For some reason, although, regardless of the method (different scenarios), I tried, the page loads super slow online, but instantly on wamp. Not sure why I'm probably going with a switch since the site is not huge.

 <?php $url = 'http://sitename.com'.$_SERVER['REQUEST_URI']; $fp = fopen($url, 'r'); $content = ""; while(!feof($fp)) { $buffer = trim(fgets($fp, 4096)); $content .= $buffer; } $start = '<title>'; $end = '<\/title>'; preg_match("/$start(.*)$end/s", $content, $match); $title = $match[1]; $metatagarray = get_meta_tags($url); $description = $metatagarray["description"]; echo "<div><strong>Title:</strong> $title</div>"; echo "<div><strong>Description:</strong> $description</div>"; ?>

and in the HTML header

 <meta property="og:title" content="<?php echo $title; ?>" /> <meta property="og:description" content="<?php echo $description; ?>" />

0

e11world Jun 21 '18 at 23:36

source share

Improved answer from @shamittomar above to get meta tags (or specified from html source)

It can be further improved ... the difference from php default get_meta_tags is that it works even if there is a unicode string

 function getMetaTags($html, $name = null) { $doc = new DOMDocument(); try { @$doc->loadHTML($html); } catch (Exception $e) { } $metas = $doc->getElementsByTagName('meta'); $data = []; for ($i = 0; $i < $metas->length; $i++) { $meta = $metas->item($i); if (!empty($meta->getAttribute('name'))) { // will ignore repeating meta tags !! $data[$meta->getAttribute('name')] = $meta->getAttribute('content'); } } if (!empty($name)) { return !empty($data[$name]) ? $data[$name] : false; } return $data; }

0

dav Jul 17 '18 at 16:40

source share

Here is a simple PHP class 2 DOM HTML code to get the details of a META page.

 $html = file_get_html($link); $meat_description = $html->find('head meta[name=description]', 0)->content; $meat_keywords = $html->find('head meta[name=keywords]', 0)->content;

-one

Khandad Niazi Dec 28 '13 at 11:39

source share

Shouldn't we use OG?

The selected answer is good, but does not work when the site is redirected (very common!) And does not return OG tags , which are the new industry standard . Here's a small feature that is a bit handy in 2018. She tries to get the OG tags and returns to the meta tags if it does not concern them:

 function getSiteOG( $url, $specificTags=0 ){ $doc = new DOMDocument(); @$doc->loadHTML(file_get_contents($url)); $res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue; foreach ($doc->getElementsByTagName('meta') as $m){ $tag = $m->getAttribute('name') ?: $m->getAttribute('property'); if(in_array($tag,['description','keywords']) || strpos($tag,'og:')===0) $res[str_replace('og:','',$tag)] = $m->getAttribute('content'); } return $specificTags? array_intersect_key( $res, array_flip($specificTags) ) : $res; } ///////////// //SAMPLE USE: print_r(getSiteOG("http://www.stackoverflow.com")); //note the incorrect url ///////////// //OUTPUT: Array ( [title] => Stack Overflow - Where Developers Learn, Share, & Build Careers [description] => Stack Overflow is the largest, most trusted online community for developers to learn, shareâ âtheir programming âknowledge, and build their careers. [type] => website [url] => https://stackoverflow.com/ [site_name] => Stack Overflow [image] => https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded )

-one

cronoklee Sep 07 '18 at 8:50

source share

shamittomar · Accepted Answer · 2010-09-14 17:54

So it should be:

 function file_get_contents_curl($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $data = curl_exec($ch); curl_close($ch); return $data; } $html = file_get_contents_curl("http://example.com/"); //parsing begins here: $doc = new DOMDocument(); @$doc->loadHTML($html); $nodes = $doc->getElementsByTagName('title'); //get and display what you need: $title = $nodes->item(0)->nodeValue; $metas = $doc->getElementsByTagName('meta'); for ($i = 0; $i < $metas->length; $i++) { $meta = $metas->item($i); if($meta->getAttribute('name') == 'description') $description = $meta->getAttribute('content'); if($meta->getAttribute('name') == 'keywords') $keywords = $meta->getAttribute('content'); } echo "Title: $title". '<br/><br/>'; echo "Description: $description". '<br/><br/>'; echo "Keywords: $keywords";

Getting headers and meta tags from an external site

More articles: