Add unicode to html string template

I use the below C # script to remove HTML tags from the description column when working in SSIS. I tried adding the following unicode & # 58 to the htmlTagPattern line below, but I can't get it to work.

Any help is appreciated.

public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {    
         Row.Message = RemoveHtml(Row.Message);
    }
   public String RemoveHtml(String message)
   {
       String htmlTagPattern = "<(.|\n)+?>";
        Regex objRegExp = new Regex(htmlTagPattern);
        message = objRegExp.Replace(message, String.Empty);
        return message;
    }
}
+4
source share
1 answer

There are many ways to convert HTML to plain text:

Using the HTMLAgilityPack Library

You can get the code from the provided samples:

You can download HTMLAgilitypack from the following links:

Using System.Net

.Net framework 4 highr, System.Net, HTML:

System.Net.HttpUtility.HtmlDecode(Row.Column)

:

:

0

Source: https://habr.com/ru/post/1691358/


All Articles