How to remove tags from HTML tag attribute in php?

I have a large number of messages created using the old CMS. This is in the HTML markup ... almost ... worse than what I've ever seen. It contains the following structures:

....<IMG alt="  - <b> ...</b>" src="http://www.example.com/articles/pic.jpg" align=left>... 

As you can see, this is not HTML because it contains tags inside tag attributes.

I need to remove tags from HTML attributes.

I tried to use parsing via DOMDocument, but could not correctly output Cyrillic characters if the body and html headers are not in the parsed string . And even if necessary, I have to remove them from the output.

The question is, how to remove tags from an HTML tag attribute in PHP?

Is preg_replace suitable for this?

+4
source share
1 answer

You can try the following:

 preg_replace('#<([^ ]+)((\s+[\w]+=((["\'])[^\5]+\5|[^ ]+))+)>#e', '"<\\1" . str_replace("\\\'", "\'", strip_tags("\\2")) . ">"', $code); 

It takes every html opening tag ( <something> ), matches all the attributes name="value" name='value' name=value , then it tags them. str_replace necessary because when the e modifier is added, PHP uses addslashes for each match before evaluating it.

I tested it and it works fine. :)

+1
source

Source: https://habr.com/ru/post/1392530/


All Articles