I need to split text separated by paragraph tag
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>"; I need to split this into an array, separated by paragraph tags. That is, I need to split this into an array with two elements:
array ([0] = "this is the first paragraph", [1] = "this is the first paragraph") Remove the closing tags </p> , because we do not need them, and then explode the string into an array when opening the tags </p> .
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>"; $text = str_replace('</p>', '', $text); $array = explode('<p>', $text); To view the code run, see the next entry in the codec . As you can see, this code will leave you with an empty array entry at index 0. If this is a problem, then it can be easily removed by calling array_shift($array) before using the array.
If your input is somewhat consistent, you can use a simple splitting method like:
$paragraphs = preg_split('~(</?p>\s*)+~', $text, PREG_SPLIT_NO_EMPTY); If preg_split will look for combinations <p> and </p> plus possible spaces and select a line there.
As an unnecessary alternative, you can also use querypath or phpquery to extract only the full paragraph of the content using:
foreach (htmlqp($text)->find("p") as $p) { print $p->text(); } Try the following:
<?php $text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>"; $array; preg_replace_callback("`<p>(.+)</p>`isU", function ($matches) { global $array; $array[] = $matches[1]; }, $text); var_dump($array); ?> This can be changed by placing the array in a class that manages it using the value value and getter methods.
This is an old question, but I could not find a reasonable solution in an hour of finding answers on stactverflow. If you have a line full of html tags (p tags), and if you want to get paragraphs (or the first paragraph), use DOMDocument .
$long_description is a string that contains <p> tags.
$long_descriptionDOM = new DOMDocument(); // This is how you use it with UTF-8 $long_descriptionDOM->loadHTML((mb_convert_encoding($long_description, 'HTML-ENTITIES', 'UTF-8'))); $paragraphs = $long_descriptionDOM->getElementsByTagName('p'); $first_paragraph = $paragraphs->item(0)->textContent(); I guess this is the right decision. No need for regular expression. You should not use regex for parsing html.