Script to convert html markup to valid XML

I have a WYSIWYG editor that creates HTML content. Tags are not always built as valid xml, and I need this to be valid xml. Does anyone have such a script? How should I do it?

+3
source share
4 answers

I'm not sure which language you use on the server, but you can look in the Html Agility Pack if you use .NET

+4
source

Maybe you should take a look at this .NET version of Tidy HTML: Tidy.NET

+2
source

, John Cowan TagSoup, HTML XML.

+1

Microsoft : SgmlReader. ( ) tidy html.

, Html xml:

/// <summary>
/// Converts a string from potential dirty HTML to valid XML
/// </summary>
/// <param name="input">The string to convert</param>
/// <returns>A valid XML fragment that contains the cleaned HTML</returns>
/// <remarks>This methods only format the html to an xml compatible parser.
/// The method does not clean dangerous tags from the source string</remarks>
public static string HtmlToXHtml(string input)
{
    using (var sr = new StringReader(input))
    {
        var hr = new SgmlReader(sr);
                    hr.InputStream = sr;
                    hr.DocType = "HTML";
        var output = new StringBuilder();
        var hw = new XmlTextWriter(new StringWriter(output));

        hr.Read();
        while (!hr.EOF)
        {
            hw.WriteNode(hr, true);
        }


        return output.ToString();
    }
}

You can simply update user input after the postback. In more complex scenarios (the need to switch between wysiwyg and Html modes) you may need a little Ajax to convert the html string to xhtml behind the curtain before showing the html source in the text box.

+1
source

Source: https://habr.com/ru/post/1792961/


All Articles