Regular expression to get page title

There are many answers to this question, but not one complete:

Using one regex, how do you retrieve a page title from a <title>Page title</title> ?

There are several other cases where header tags are introduced, for example:

 <TITLE>Page title</TITLE> <title> Page title</title> <title> Page title </title> <title lang="en-US">Page title</title> 

... or any combination above.

And it can be on a separate line or between other tags:

 <head> <title>Page title</title> </head> <head><title>Page title</title></head> 

Thanks for the help in advance.

UDPATE: So a regex approach might not be the best solution. Which PHP-based PHP parser can handle all scripts where the HTML is well-formed (or not very well)?

UPDATE 2: sp00m regex ( / questions / 1447700 / regular-expression-to-get-page-title / 4464173 # 4464173 seems to work in all cases. I will return to this if necessary.

+4
source share
3 answers

Use an HTML parser instead . But in the case of:

 <title[^>]*>(.*?)</title> 

Demo

+7
source

Use the DOMDocument class:

 $doc = new DOMDocument(); $doc->loadHTML($html); $titles = $doc->getElementsByTagName("title"); echo $titles->item[0]->nodeValue; 
+2
source

Use this regex:

 <title>[\s\S]*?</title> 
0
source

Source: https://habr.com/ru/post/1447700/


All Articles