Remove only some html tags in C #

Question

Remove only some html tags in C #

I have a line:

string hmtl = "<DIV><B> xpto </B></DIV>

and you need to remove the <div> and </DIV> tags. Result: <B> xpto </B>

Just <DIV> and </DIV> without removing a lot of HTML tags, but keep <B> xpto </B> .

+4

html c #

kaub0st3r Oct 30 '12 at 16:32

source share

5 answers

Use htmlagilitypack

 HtmlDocument doc = new HtmlDocument(); doc.LoadHtml("<html>yourHtml</html>"); foreach(var item in doc.DocumentNode.SelectNodes("//div"))// "//div" is a xpath which means select div nodes that are anywhere in the html { item.InnerHtml;//your div content }

If you only need B tags.

 foreach(var item in doc.DocumentNode.SelectNodes("//B")) { item.OuterHtml;//your B tag and its content }

+5

Anirudha Oct 30 '12 at 16:35

source share

If you simply remove the div tags, it will receive the div tags as well as any attributes that they may have.

 var html = "<DIV><B> xpto <div text='abc'/></B></DIV><b>Other text <div>test</div>" var pattern = "@"(\</?DIV(.*?)/?\>)""; // Replace any match with nothing/empty string Regex.Replace(html, pattern, string.Empty, RegexOptions.IgnoreCase);

Result

 <B> xpto </B><b>Other text test

+3

ΩmegaMan Oct 30 '12 at 16:42

source share

you can use regular

 <[(/body|html)\s]*>

in C #:

  var result = Regex.Replace(html, @"<[(/body|html)\s]*>", ""); <html> <body> < / html> < / body>

+1

jiji2663 Apr 13 '15 at 13:08

source share

 html = Regex.Replace(html,@"<*DIV>", String.Empty);

0

Jimmy Oct 30 '12 at 16:35

source share

Ria · Accepted Answer · 2012-10-30T16:51:18+0000

Use Regex :

 var result = Regex.Replace(html, @"</?DIV>", "");

UPDATED

as you mentioned , with this code, the regex removes all tages else B

 var hmtl = "<DIV><B> xpto </B></DIV>"; var remainTag = "B"; var pattern = String.Format("(</?(?!{0})[^<>]*(?<!{0})>)", remainTag ); var result = Regex.Replace(hmtl , pattern, "");

Remove only some html tags in C #

More articles: