Matching everything between html <body> tags with PHP
I have a script that returns the following in a variable called $ content
<body> <p><span class=\"c-sc\">dgdfgdf</span></p> </body> However, I need to put everything between the body tag inside an array called match
I am doing the following to map stuff between body tag
preg_match('/<body>(.*)<\/body>/',$content,$matches); but the $ mathces array is empty, how can I get it to return everything inside the body tag
You should not use regular expressions for HTML parsing.
Your specific problem in this case is that you need to add the DOTALL modifier so that the dot matches the newline characters.
preg_match('/<body>(.*)<\/body>/s', $content, $matches); But seriously, use an HTML parser instead. There are so many ways that the above regex can break.
Do not try to process html with regular expressions ! Instead, use the PHP built-in parser :
$dom = new DOMDocument; $dom->loadHTML($string); $bodies = $dom->getElementsByTagName('body'); assert($bodies->length === 1); $body = $bodies->item(0); for ($i = 0; $i < $body->children->length; $i++) { $body->remove($body->children->item($i)); } $string = $dom->saveHTML(); If for some reason you do not have a DOMDocument, try this
Step 1. Download simple_html_dom
Step 2. Read the documentation on how to use your selectors
require_once("simple_html_dom.php"); $doc = new simple_html_dom(); $doc->load($someHtmlString); $body = $doc->find("body")->innertext;