xptoand you need to remove the
and
tags. Re...">

Remove only some html tags in C #

I have a line:

string hmtl = "<DIV><B> xpto </B></DIV> 

and you need to remove the <div> and </DIV> tags. Result: <B> xpto </B>


Just <DIV> and </DIV> without removing a lot of HTML tags, but keep <B> xpto </B> .

+4
source share
5 answers

Use Regex :

 var result = Regex.Replace(html, @"</?DIV>", ""); 

UPDATED

as you mentioned , with this code, the regex removes all tages else B

 var hmtl = "<DIV><B> xpto </B></DIV>"; var remainTag = "B"; var pattern = String.Format("(</?(?!{0})[^<>]*(?<!{0})>)", remainTag ); var result = Regex.Replace(hmtl , pattern, ""); 
+1
source

Use htmlagilitypack

 HtmlDocument doc = new HtmlDocument(); doc.LoadHtml("<html>yourHtml</html>"); foreach(var item in doc.DocumentNode.SelectNodes("//div"))// "//div" is a xpath which means select div nodes that are anywhere in the html { item.InnerHtml;//your div content } 

If you only need B tags.

 foreach(var item in doc.DocumentNode.SelectNodes("//B")) { item.OuterHtml;//your B tag and its content } 
+5
source

If you simply remove the div tags, it will receive the div tags as well as any attributes that they may have.

 var html = "<DIV><B> xpto <div text='abc'/></B></DIV><b>Other text <div>test</div>" var pattern = "@"(\</?DIV(.*?)/?\>)""; // Replace any match with nothing/empty string Regex.Replace(html, pattern, string.Empty, RegexOptions.IgnoreCase); 

Result

 <B> xpto </B><b>Other text test 
+3
source

you can use regular

 <[(/body|html)\s]*> 

in C #:

  var result = Regex.Replace(html, @"<[(/body|html)\s]*>", ""); <html> <body> < / html> < / body> 
+1
source
 html = Regex.Replace(html,@"<*DIV>", String.Empty); 
0
source

Source: https://habr.com/ru/post/1443044/


All Articles