Disabling html tags in a string

I have a program that I am writing that should cut html tags from a string. I am trying to replace all lines starting with "<" and ending with ">". This (obviously, because I'm asking about it here) is not working yet. Here is what I tried:

StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "") 

This simply returns what seems like a random part of the original string. I also tried

 For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>") StrippedContent = StrippedContent.Replace(StringMatch.Value, "") Next 

That did the same (returns what seems like a random part of the original string). Is there a better way to do this? For the better, I mean the way that works.

+4
source share
3 answers

Description

This expression will be:

  • Find and replace all tags with nothing.
  • avoid problem marginal cases

Regex: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>

Replace: nothing

enter image description here

Example

Sample text

Note the complex edge of the mouse over the function

these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.

code

 Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim sourcestring as String = "replace with your source string" Dim replacementstring as String = "" Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>" Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline)) End Sub End Module 

String after replacement

 these are the droids you are looking for. 
+21
source

Well, that proves that you should always search Google for an answer. Here is the method I got from http://www.dotnetperls.com/remove-html-tags-vbnet

 Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim html As String = "<p>There was a <b>.NET</b> programmer " + "and he stripped the <i>HTML</i> tags.</p>" Dim tagless As String = StripTags(html) Console.WriteLine(tagless) End Sub Function StripTags(ByVal html As String) As String Return Regex.Replace(html, "<.*?>", "") End Function End Module 
+4
source

Here's a simple function using the regex pattern that Ro Yo Mi hosted.

 <Extension()> Public Function RemoveHtmlTags(value As String) As String Return Regex.Replace(value, "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>", "") End Function 

Demonstration:

 Dim html As String = "This <i>is</i> just a <b>demo</b>.".RemoveHtmlTags() Console.WriteLine(html) 
0
source

Source: https://habr.com/ru/post/1491538/


All Articles