Regular expression to remove XML tags and their contents

I have the following line, and I would like to remove <bpt *>*</bpt>and <ept *>*</ept>(pay attention to the contents of the additional tag inside them, which also needs to be removed) without using an XML parser (too long for small lines).

The big <bpt i="1" x="1" type="bold"><b></bpt>black<ept i="1"></b></ept> <bpt i="2" x="2" type="ulined"><u></bpt>cat<ept i="2"></u></ept> sleeps.

Any regular expression will run in VB.NET or C #.

+3
source share
7 answers

If you just want to remove all tags from a string, use this (C #):

try {
    yourstring = Regex.Replace(yourstring, "(<[be]pt[^>]+>.+?</[be]pt>)", "");
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

EDIT:

. , . < ** pt * > , . , [be] , . Regex , Regex:

bool FoundMatch = false;

try {
    Regex regex = new Regex(@"<([be])pt[^>]+>.+?</\1pt>");
    while(regex.IsMatch(yourstring) ) {
        yourstring = regex.Replace(yourstring, "");
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

:

, "." . ".", ? " , ".". , . RegexBuddy , . , , , , :

    // <([be])pt[^>]+>.+?</\1pt>
// 
// Match the character "<" literally «<»
// Match the regular expression below and capture its match into backreference number 1 «([be])»
//    Match a single character present in the list "be" «[be]»
// Match the characters "pt" literally «pt»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the character ">" literally «>»
// Match any single character that is not a line break character «.+?»
//    Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters "</" literally «</»
// Match the same text as most recently matched by backreference number 1 «\1»
// Match the characters "pt>" literally «pt>»
+7

, ?

(<bpt .*?>.*?</bpt>)|(<ept .*?>.*?</ept>)

? , * , .

, , - . , .

+1

, ? ? ?

- , , - - < bpt foo = "bar > " >

+1

.NET regex ? ,

(<([eb])pt[^>]+>((?!</\2pt>).)+</\2pt>)

- . , . , , bpt/ept. \s , ..

0

XML, , XML CDATA, .

(.. , ) XSLT. XSLT, , , , . , . .

0

- regex.pattern xml? , replace shell, . < > . ' ..   " " Friend ReplaceSpecChars (ByVal str As String)   Dim arrLessThan   Dim arrGreaterThan    Not IsDBNull (str)

  str = CStr(str)
  If Len(str) > 0 Then
    str = Replace(str, "&", "&amp;")
    str = Replace(str, "'", "&apos;")
    str = Replace(str, """", "&quot;")
    arrLessThan = FindLocationOfChar("<", str)
    arrGreaterThan = FindLocationOfChar(">", str)
    str = ChangeGreaterLess(arrLessThan, arrGreaterThan, str)
    str = Replace(str, Chr(13), "chr(13)")
    str = Replace(str, Chr(10), "chr(10)")
  End If
  Return str
Else
  Return ""
End If

Friend ChangeGreaterLess (ByVal lh As Collection, ByVal gr As Collection, ByVal str As String) As String      As Integer = 0 lh.Count           CInt (lh.Item(i)) > CInt (gr.Item(i))               str = (str, "<", "<" )/////////////          End If

  Next


    str = Replace(str, ">", "&gt;")

Friend FindLocationOfChar (ByVal chr As Char, ByVal str As String)       Dim arr        As Integer = 1 str.Length() - 1            str.ToCharArray(i, 1) = chr               arr.Add()                          arr

xml , .

0

Did you measure it? I ran into performance issues using the .NET regex mechanism, but, on the contrary, parsed xml files about 40 GB in size without problems using the Xml parser (however, you will need to use XmlReader for large lines).

Please post a sample actual code and indicate your performance requirements. I doubt the class Regexis the best solution here if performance matters.

0
source

Source: https://habr.com/ru/post/1697388/


All Articles