this photo of Joe'; ...">

Using regex to remove HTML tags

I need to convert

$text = 'We had <i>fun</i>. Look at <a href="http://example.com">this photo</a> of Joe';

[Edit] There may be several links in the text.

to

$text = 'We had fun. Look at this photo (http://example.com) of Joe';

All HTML tags must be removed, and the href value from the tags <a>must be added as described above.

What would be an effective way to solve this with regex? Any piece of code will be great.

+3
source share
5 answers

First create a preg_replace to save the link. You can use:

preg_replace('<a href="(.*?)">(.*?)</a>', '$\2 ($\1)', $str);

Then use strip_tagsthat will complete the remaining tags.

+5
source

try the xml parser to replace any tag with its internal html and tags with its href attribute.

http://www.php.net/manual/en/book.domxml.php

+1

DOM:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[@href]') as $node) {
    $textNode = new DOMText(sprintf('%s (%s)',
        $node->nodeValue, $node->getAttribute('href')));
    $node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());

XPath:

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
    if($node->hasAttribute('href')) {
        $textNode = new DOMText(sprintf('%s (%s)',
            $node->nodeValue, $node->getAttribute('href')));
        $node->parentNode->replaceChild($textNode, $node);
    }
}
echo strip_tags($dom->saveHTML());

, , HTML- DomDocument. XPath, SQL XML, href. node innerHTML href . DOM API Xpath.

, , Regex, , , .

+1

, . , , regex, , :

<i> - :

$text = replace($text, "<i>", "");
$text = replace($text, "</i>", "");

( php , replace , - , .)

<a> . . , <a >. </a>

:

$start = strrpos( $text, "<a" );
$end = strrpos( $text, "</a>", $start );
$text = substr( $text,  $start, $end );
$text = replace($text, "</a>", "");

( , , - , . , , , , " ". )

:

0
source

It is also very easy to do with the parser:

# available from http://simplehtmldom.sourceforge.net
include('simple_html_dom.php');

# parse and echo
$html = str_get_html('We had <i>fun</i>. Look at <a href="http://example.com">this photo</a> of Joe');

$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";

echo strip_tags($html);

And this creates the code you want in the test case.

0
source

Source: https://habr.com/ru/post/1744210/


All Articles