Avoiding certain HTML tags in a string

Question

Avoiding certain HTML tags in a string

I have a requirement to avoid the blacklist of HTML tags before displaying it on a web page. The reason for the selectivity is the ability to save formatting (bod, italics, fonts, etc.), but not tags that will “break” the page (scripts, meta, etc.).

After thinking about this for a while, I came up with two approaches:

RegEx - as almost everyone will tell you, using RegEx to control HTML is a bad idea
HtmlAgilityPack

I figured my best (and really only) solution was to load the string into the HtmlAgilityPack loop and recursively through the child nodes. For each node, I would check if it was on the specified blacklist. If that were the case, I would avoid opening (and closing, if it existed) the node, then process it InnerHtml. If it was not in the list, then print the node as is, still processing InnerHtml.

So, given the following (very simple) source

The quick <b style='padding: 0 25em;'>brown</b> fox <b>jumped <i>over</i> the <meta http-equiv='refresh' /> moon</b>.

I need the following output

The quick <b style='padding: 0 25em;'>brown</b> fox <b>jumped <i>over</i> the &lt;meta http-equiv='refresh' /&gt; moon</b>.

After many studies, I encountered several problems, questions, and roadblocks.

Is the HtmlAgilityPackbest library for this requirement?
? , , .Descendants(), , . , <i>over</i> node InnerHtml b node, node .
, InnerHtml. ( ) , InnerHtml, . , , (Name, Id, Attributes ..), , .

, :

public string EscapeHtmlTags(string value, ICollection<string> tags) {
   var doc = new System.Text.StringBuilder();
   doc.LoadHtml(doc);

   if (tags.Contains(doc.DocumentNode.Name, StringComparer.CurrentCultureIgnoreCase)) {
      // output opening tag as escaped string ????
      EscapeHtmlTags(doc.DocumentNode.InnerHtml, tags);
      // output closing tag as escaped string ????
   }
   else {
      // output opening tag as is ????
      EscapeHtmlTags(doc.DocumentNode.InnerHtml, tags);
      // output closing tag as is ????
   }
}

, , , NodeTypes - , , StringBuilder .. .

?

+4

html xml html-agility-pack

Jason 07 . '14 21:07

1

MilanG · Answer 1 · 2014-05-19T14:09:05+0000

, PHP:

http://www.php.net/manual/en/function.strip-tags.php

, .

Avoiding certain HTML tags in a string

More articles: