JavaScript parser for DOM

We have a special requirement in a project where we need to parse an HTML string (from an AJAX response) on the client side only using JavaScript . Thats right without parsing in PHP or Java! I went through StackOverflow all week and still have not received an acceptable solution.

Additional information on requirements:

  • We can use any library (preferably dojo and / or jQuery) or switch to native!

  • We need to parse the entire HTML document that we get as a string , including <head> and <body> .

  • We also need to periodically serialize the analyzed DOM structures into strings.

  • Finally, We do not want to add the parsed DOM to the current document. Rather, we will send it back to the server for permanent storage.

    / li>

For example: We need something like

 var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>'); dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title"; 

In my research, these are our options:

  • A TinyMCE Parser . Problem? We must definitely enable the editor, I think. How about parsing HTML where we don’t need an editor?

  • John Resig Parser . Must be our best choice. Unfortunately, the parser crashes when it is provided with the entire contents of the page!

  • jQuery $ (htmlString) or dojo.toDom (htmlString). Both rely on DocumentFragment and therefore type <head> and <body> !

EDIT . We want to serialize HTML so that we can catch certain custom HTML codes through RegExp. We need to give users the ability to edit meta tags, header tags, etc., Therefore, HTML Parser.

Oh, and I feel like they’ll kill me on Stack Overflow, even if I just hint at HTML parsing through RegExp !!!

+6
source share
5 answers

You can use the current document without adding any nodes to it.

Try something like this:

 function toNode(html) { var doc = document.createElement('html'); doc.innerHTML = html; return doc; } var node = toNode('<html><head><title> This is the old title. </title></head></html>'); console.log(node);​ 

http://jsfiddle.net/6SvqA/3/

+10
source

I would suggest a 2-part solution in which you read tags that jQuery will not parse for you, and then pass the remainder to jQuery. If you are looking for a purely javascript solution to analyze the HTML data structure, jQuery is probably your best bet as it has many built-in functions for managing data. You could create your plugin as a jQuery plugin that could be invoked via :. $ .parser or something like that. If you extend jQuery with your own function for analyzing data, you can also return an extended jQuery object that contains functions for reading certain data elements even from the header, since you can manually analyze the information ... and save it in the same object.

+1
source

Since HTML is essentially XML, you can use jquery parseXML

 var dom = $.parseXML(html); $('title', dom).text("This is a new Title"); 

Edit:

If you want to return it to a string, you will need to use the xml plugin, but I can not find its original source, here it is:

 /** * jQuery xml plugin * Converts XML node(s) to string * * Copyright (c) 2009 Radim Svoboda * Dual licensed under the MIT (MIT-LICENSE.txt) * and GPL (GPL-LICENSE.txt) licenses. * * @author Radim Svoboda, user Zzzzzz * @version 1.0.0 */ /** * Converts XML node(s) to string using web-browser features. * Similar to .html() with HTML nodes * This method is READ-ONLY. * * @param all set to TRUE (1,"all",etc.) process all elements, * otherwise process content of the first matched element * * @return string obtained from XML node(s) */ jQuery.fn.xml = function(all) { //result to return var s = ""; //Anything to process ? if( this.length ) //"object" with nodes to convert to string ( ( ( typeof all != 'undefined' ) && all ) ? //all the nodes this : //content of the first matched element jQuery(this[0]).contents() ) //convert node(s) to string .each(function(){ s += window.ActiveXObject ?//== IE browser ? //for IE this.xml : //for other browsers (new XMLSerializer()).serializeToString(this) ; }); return s; }; 
+1
source

I don’t know why anyone needs this, but I suggest you just reset your source in an iframe. The browser can perform parsing for you. You can even execute DOM queries on the result.

0
source

If you want a full parser that doesn't rely on any existing thing in the browser to load your interpreter, the HTML parser in dom.js is at the top level. The whole goal is to parse html for use in the javascript-hosted DOM, so it should satisfy both the DOM specifications and the need to analyze and use the results in js, all of which do not imply any existing tools other than the base JS. It works even in node.js or spidermonkey jsshell or web workers. https://github.com/andreasgal/dom.js

It also has a serialization part, but for this you will need to fix the use of not only the parser part. You can find standalone serializers, although they work with any structure like the DOM.

0
source

Source: https://habr.com/ru/post/909852/


All Articles