C # text separation by tag

I split the line in my code as follows:

var lines = myString == null 
            ? new string[] { } 
            : myString.Split(new[] { "\n", "<br />" }, StringSplitOptions.RemoveEmptyEntries);

The problem is that sometimes the text looks like this:

sdjkgjkdgjk<br />asdfsdg

And in this case my code works. however, in other cases, the text is as follows:

sdjkgjkdgjk<br style="someAttribute: someProperty;"/>asdfsdg

And in this case, I do not get the result that I want. how to split this line into the whole br tag along with all its attributes?

+4
source share
5 answers

Use Regex.Split(). The following is an example: -

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string input = "sdjkgjkdgjk<br />asdfsdg";
        string pattern = "<br.*\\/>";            // Split on <br/>

        DisplayByRegex(input, pattern);
        input = "sdjkgjkdgjk<br style=\"someAttribute: someProperty;\"/>asdfsdg";
        DisplayByRegex(input, pattern);
        Console.Read();
    }

    private static void DisplayByRegex(string input, string pattern)
    {
        string[] substrings = Regex.Split(input, pattern);
        foreach (string match in substrings)
        {
            Console.WriteLine("'{0}'", match);
        }
    }
}
+1
source

If you only need to separate the tags brand the new line, regex is a good option:

var lines = myString == null ?
    new string[] { } :
    Regex.Split(myString, "(<br.+>)|(\r\n?|\n)");

But if your requirements get complicated, I would suggest using an HTML parser.

+1

:

var parts = Regex.Split(value, @"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();
+1

, .

    var items = Regex.Split("sdjkgjkdgjk<br style='someAttribute: someProperty;'/>asdfsdg", @"<.*?>");
0

. .

0

Source: https://habr.com/ru/post/1609439/


All Articles