Matching everything between html <body> tags with PHP

I have a script that returns the following in a variable called $ content

<body> <p><span class=\"c-sc\">dgdfgdf</span></p> </body> 

However, I need to put everything between the body tag inside an array called match

I am doing the following to map stuff between body tag

 preg_match('/<body>(.*)<\/body>/',$content,$matches); 

but the $ mathces array is empty, how can I get it to return everything inside the body tag

+4
source share
3 answers

You should not use regular expressions for HTML parsing.

Your specific problem in this case is that you need to add the DOTALL modifier so that the dot matches the newline characters.

 preg_match('/<body>(.*)<\/body>/s', $content, $matches); 

But seriously, use an HTML parser instead. There are so many ways that the above regex can break.

+9
source

Do not try to process html with regular expressions ! Instead, use the PHP built-in parser :

 $dom = new DOMDocument; $dom->loadHTML($string); $bodies = $dom->getElementsByTagName('body'); assert($bodies->length === 1); $body = $bodies->item(0); for ($i = 0; $i < $body->children->length; $i++) { $body->remove($body->children->item($i)); } $string = $dom->saveHTML(); 
+11
source

If for some reason you do not have a DOMDocument, try this

Step 1. Download simple_html_dom

Step 2. Read the documentation on how to use your selectors

 require_once("simple_html_dom.php"); $doc = new simple_html_dom(); $doc->load($someHtmlString); $body = $doc->find("body")->innertext; 
+2
source

Source: https://habr.com/ru/post/1300020/


All Articles