RegEx for <li> </li> Tags
I am working on a C # WinForm application. In this application, I have a snippet like this:
<ul> <li>abc <li>bbc <li>xyz <li>pqr </li></li></li></li> </ul> but, I want to get a conclusion like ..
<ul> <li>abc</li> <li>bbc</li> <li>xyz</li> <li>pqr</li> </ul> Is there any method by which this thing can be accomplished?
Can anyone suggest me any RegEx for this problem?
Thanks. Best wishes.
Simple without using any fancy regex
Try below, you can implement your own code
1. first Remove all </li> from the snippet line.replace("</li>","") 2. Read each line starts with <li> if (line.startswith("<li">) 3. and append the </li> at the end line+ ="</li>" 4. combine all the line resString += line; This works on your specific example, but it can break down a lot on another input (for example, if the <li> tags should have covered line breaks), so if it does not give the desired results, please edit your question in more detail.
cleanString = Regex.Replace(subjectString, "(?:</li>)+", "", RegexOptions.IgnoreCase); resultString = Regex.Replace(cleanString, "<li>(.*)", "<li>$1</li>", RegexOptions.IgnoreCase); public AddLiandOl line (xhtml line) {
xhtml = xhtml.Replace("</li>", string.Empty); xhtml = xhtml.Replace("<li>", "</li><li>"); xhtml = xhtml.Replace("</ol>", "</li></ol>"); xhtml = xhtml.Replace("</ul>", "</li></ul>"); Regex replaceul = new Regex("<ul>(.+?)</li>", RegexOptions.IgnoreCase | RegexOptions.Singleline); xhtml = replaceul.Replace(xhtml,"<ul>"); Regex replaceol = new Regex("<ol>(.+?)</li>", RegexOptions.IgnoreCase | RegexOptions.Singleline); xhtml = replaceol.Replace(xhtml, "<ol>"); return xhtml; } Try this, I tested it. it works ... It will take almost 30 seconds to replace all tags.
This is not the most pleasant solution to your problem, but it is insanely fast. Regular expressions are slow compared to straight string methods.
My string method compared to Tim Pitzker is two Regex.Replace. (Sorry Tim, I had to choose someone, and you have upvote :))
it's 10,000 reps. numbers - the number of ticks passed:
regex replace: avg: 40.9659. max: 2273
replace the line: Aug: 18.4566. max: 1478
string strOrg = "<ul>\n" + "<li>abc\n" + "<li>bbc\n" + "<li>xyz\n" + "<li>pqr </li></li></li></li>\n" + "</ul>"; string strFinal = FixUnorderedList(strOrg); public static string FixUnorderedList(string str) { //remove what we're going to put back later //(these could be placed on the same line, one after the other) str = str.Replace("\n", string.Empty); str = str.Replace("</li>", string.Empty); str = str.Replace("<ul>", string.Empty); str = str.Replace("</ul>", string.Empty); //get each li element string[] astrLIs = str.Split(new string[] { "<li>" }, StringSplitOptions.RemoveEmptyEntries); //rebuild the list correctly string strFinal = "<ul>"; foreach(string strLI in astrLIs) strFinal += string.Format("\n<li>{0}</li>", strLI.Trim()); strFinal += "\n</ul>"; return strFinal; } string unorderlist = "<ul><li>ONE</li><li>TWO</li><li>THREE</li></ul>"; Regex regexul = new Regex("<ul>"); Match m = regexul.Match(unorderlist); if (m.Success) { unorderlist = regexul.Replace(unorderlist, string.Empty); Regex regex1 = new Regex("<li>"); unorderlist = regex1.Replace(unorderlist, ":"); Regex regex2 = new Regex("</li>"); unorderlist = regex2.Replace(unorderlist, "\n"); Regex regex3 = new Regex("</ul>"); unorderlist = regex3.Replace(unorderlist, "\n"); Console.WriteLine(unorderlist); }