Extract text from <1> </1> (HTML / XML-Like, but with tag number)

So, I have a long line containing pointed brackets from which I want to extract text parts.

string exampleString = "<1>text1</1><27>text27</27><3>text3</3>";

I want to get it

1 = "text1"
27 = "text27"
3 = "text3"

How do I get it easy? I could not come up with a non-hacker way to do this.

Thanks.

+4
source share
2 answers

Using basic XmlReaderand some other tricks to make a wrapper for creating XML-like data, I would do something like this

string xmlString = "<1>text1</1><27>text27</27><3>text3</3>";
xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>";
string key = "";
List<KeyValuePair<string,string>> kvpList = new List<KeyValuePair<string,string>>(); //assuming the result is in the KVP format
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString))){
    bool firstElement = true;
    while (xmlReader.Read()) {
        if (firstElement) { //throwing away root
            firstElement = false;
            continue;
        }
        if (xmlReader.NodeType == XmlNodeType.Element) {
            key = xmlReader.Name.Substring(1); //cut of "o"
        } else if (xmlReader.NodeType == XmlNodeType.Text) {
            kvpList.Add(new KeyValuePair<string,string>(key, xmlReader.Value));
        }
    }
}

Edit:

The main trick of this line:

xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>"; //wrap to make this having single root, o is put to force the tagName started with known letter (comment edit suggested by Mr. chwarr)

If you first replace everything opening pointy bracketswith itself + char, i.e.

<1>text1</1> -> <o1>text1<o/1> //first replacement, fix the number issue 

and then cancel the sequence of all opening point brackets + char + forward slashtoopening point brackets + forward slash + char

<o1>text1<o/1> -> <o1>text1</o1> //second replacement, fix the ending tag issue

WinForm RichTextBox, ,

for (int i = 0; i < kvpList.Count; ++i) {
    richTextBox1.AppendText(kvpList[i].Key + " = " + kvpList[i].Value + "\n");
}

:

enter image description here

+6

, split Regex:

string exampleString = "<1>text1</1><27>text27</27><3>text3</3>";

string[] results = exampleString.Split(new string[] { "><" }, StringSplitOptions.None);

Regex r = new Regex(@"^<?(\d+)>([^<]+)<");

foreach (string result in results)
{
    Match m = r.Match(result);
    if (m.Success)
    {
        string index = m.Groups[1].Value;
        string value = m.Groups[2].Value;

    }
}

, , - "<", .

+1

Source: https://habr.com/ru/post/1621487/


All Articles