We have a special requirement in a project where we need to parse an HTML string (from an AJAX response) on the client side only using JavaScript . Thats right without parsing in PHP or Java! I went through StackOverflow all week and still have not received an acceptable solution.
Additional information on requirements:
We can use any library (preferably dojo and / or jQuery) or switch to native!
We need to parse the entire HTML document that we get as a string , including <head>
and <body>
.
We also need to periodically serialize the analyzed DOM structures into strings.
Finally, We do not want to add the parsed DOM to the current document. Rather, we will send it back to the server for permanent storage.
/ li>
For example: We need something like
var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>'); dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title";
In my research, these are our options:
A TinyMCE Parser . Problem? We must definitely enable the editor, I think. How about parsing HTML where we donβt need an editor?
John Resig Parser . Must be our best choice. Unfortunately, the parser crashes when it is provided with the entire contents of the page!
jQuery $ (htmlString) or dojo.toDom (htmlString). Both rely on DocumentFragment and therefore type <head>
and <body>
!
EDIT . We want to serialize HTML so that we can catch certain custom HTML codes through RegExp. We need to give users the ability to edit meta tags, header tags, etc., Therefore, HTML Parser.
Oh, and I feel like theyβll kill me on Stack Overflow, even if I just hint at HTML parsing through RegExp !!!
source share