Remove only some html tags in C #
5 answers
Use Regex :
var result = Regex.Replace(html, @"</?DIV>", ""); UPDATED
as you mentioned , with this code, the regex removes all tages else B
var hmtl = "<DIV><B> xpto </B></DIV>"; var remainTag = "B"; var pattern = String.Format("(</?(?!{0})[^<>]*(?<!{0})>)", remainTag ); var result = Regex.Replace(hmtl , pattern, ""); +1
Use htmlagilitypack
HtmlDocument doc = new HtmlDocument(); doc.LoadHtml("<html>yourHtml</html>"); foreach(var item in doc.DocumentNode.SelectNodes("//div"))// "//div" is a xpath which means select div nodes that are anywhere in the html { item.InnerHtml;//your div content } If you only need B tags.
foreach(var item in doc.DocumentNode.SelectNodes("//B")) { item.OuterHtml;//your B tag and its content } +5
If you simply remove the div tags, it will receive the div tags as well as any attributes that they may have.
var html = "<DIV><B> xpto <div text='abc'/></B></DIV><b>Other text <div>test</div>" var pattern = "@"(\</?DIV(.*?)/?\>)""; // Replace any match with nothing/empty string Regex.Replace(html, pattern, string.Empty, RegexOptions.IgnoreCase); Result
<B> xpto </B><b>Other text test +3