HTML map for JSON

I am trying to convert HTML to JSON with an intact structure. Are there libraries that do this, or do I need to write my own? I suggest that if there are no html2json libraries, I could take the xml2json library as a start. After all, html is just an xml variant anyway?

UPDATE: Well, I should probably give an example. I am trying to do the following. Parse the html line:

<div> <span>text</span>Text2 </div> 

into a json object, for example:

 { "type" : "div", "content" : [ { "type" : "span", "content" : [ "Text2" ] }, "Text2" ] } 

NOTE If you haven’t noticed the tag, I’m looking for a solution in Javascript

+48
json javascript html
Oct 19 '12 at 18:52
source share
7 answers

I just wrote this function that does what you want, try let me know if it does not work correctly for you:

 // Test with an element. var initElement = document.getElementsByTagName("html")[0]; var json = mapDOM(initElement, true); console.log(json); // Test with a string. initElement = "<div><span>text</span>Text2</div>"; json = mapDOM(initElement, true); console.log(json); function mapDOM(element, json) { var treeObject = {}; // If string convert to document Node if (typeof element === "string") { if (window.DOMParser) { parser = new DOMParser(); docNode = parser.parseFromString(element,"text/xml"); } else { // Microsoft strikes again docNode = new ActiveXObject("Microsoft.XMLDOM"); docNode.async = false; docNode.loadXML(element); } element = docNode.firstChild; } //Recursively loop through DOM elements and assign properties to object function treeHTML(element, object) { object["type"] = element.nodeName; var nodeList = element.childNodes; if (nodeList != null) { if (nodeList.length) { object["content"] = []; for (var i = 0; i < nodeList.length; i++) { if (nodeList[i].nodeType == 3) { object["content"].push(nodeList[i].nodeValue); } else { object["content"].push({}); treeHTML(nodeList[i], object["content"][object["content"].length -1]); } } } } if (element.attributes != null) { if (element.attributes.length) { object["attributes"] = {}; for (var i = 0; i < element.attributes.length; i++) { object["attributes"][element.attributes[i].nodeName] = element.attributes[i].nodeValue; } } } } treeHTML(element, treeObject); return (json) ? JSON.stringify(treeObject) : treeObject; } 

Working example: http://jsfiddle.net/JUSsf/ (Tested in Chrome, I can not guarantee full browser support - you will have to test this).

It creates an object that contains the tree structure of the HTML page in the format you requested, and then uses JSON.stringify() , which is included in most modern browsers (IE8 +, Firefox 3 + .etc); If you need to support older browsers, you can enable json2.js .

Either an DOM element or a string containing valid XHTML can be used as an argument (I'm sure, I'm not sure if DOMParser() will be choked in certain situations, since it is set to "text/xml" or whether it is just does not provide error handling. Unfortunately, "text/html" has poor browser support).

You can easily change the range of this function by passing another value as element . No matter what value you pass, it will be the root of your JSON map.

Enjoy

+41
Oct 20
source share

html2json and json2html on GitHub , which is built on the basis of John Resig htmlparser.js, includes some test cases and is great for me.

+17
Mar 30 '13 at 14:38
source share

Presenting complex HTML documents will be complex and rich, but I would like to share several methods to show how to start this program. This answer is different in that it uses data abstraction and the toJSON method to construct the result recursively

Below html2json is a tiny function that takes an HTML node as input and returns a JSON string as a result. Pay particular attention to how the code is fairly flat, but it is still able to create a deeply nested tree structure - anything is possible with almost zero complexity.

 // data Elem = Elem Node const Elem = e => ({ toJSON : () => ({ tagName: e.tagName, textContent: e.textContent, attributes: Array.from(e.attributes, ({name, value}) => [name, value]), children: Array.from(e.children, Elem) }) }) // html2json :: Node -> JSONString const html2json = e => JSON.stringify(Elem(e), null, ' ') console.log(html2json(document.querySelector('main'))) 
 <main> <h1 class="mainHeading">Some heading</h1> <ul id="menu"> <li><a href="/a">a</a></li> <li><a href="/b">b</a></li> <li><a href="/c">c</a></li> </ul> <p>some text</p> </main> 

In the previous example, textContent bit confusing. To fix this, we introduce another TextElem data TextElem . We need to match childNodes (instead of children ) and choose to return the correct data type based on e.nodeType - this brings us closer to what we might need

 // data Elem = Elem Node | TextElem Node const TextElem = e => ({ toJSON: () => ({ type: 'TextElem', textContent: e.textContent }) }) const Elem = e => ({ toJSON : () => ({ type: 'Elem', tagName: e.tagName, attributes: Array.from(e.attributes, ({name, value}) => [name, value]), children: Array.from(e.childNodes, fromNode) }) }) // fromNode :: Node -> Elem const fromNode = e => { switch (e.nodeType) { case 3: return TextElem(e) default: return Elem(e) } } // html2json :: Node -> JSONString const html2json = e => JSON.stringify(Elem(e), null, ' ') console.log(html2json(document.querySelector('main'))) 
 <main> <h1 class="mainHeading">Some heading</h1> <ul id="menu"> <li><a href="/a">a</a></li> <li><a href="/b">b</a></li> <li><a href="/c">c</a></li> </ul> <p>some text</p> </main> 

In any case, these are just two iterations on the problem. Of course, you will have to turn to corner cases when they come, but what is nice about this approach is that it gives you great flexibility for coding HTML, but you want JSON and don't introduce too many complexities.

In my experience, you can continue to repeat this technique and achieve really good results. If this answer is interesting to someone and I would like to expand something, let me know ^ _ ^

Related: Recursive methods using JavaScript: creating your own version of JSON.stringify

+3
May 31 '17 at 19:58
source share

I got some links when you read JSON in ExtJS full framework itself.

http://www.thomasfrank.se/xml_to_json.html

http://camel.apache.org/xmljson.html

online XML for JSON Converter: http://jsontoxml.utilities-online.info/

UPDATE BTW. In order to get JSON as added in the question, HTML must have type tags and content in it too like this, or you need to use some xslt transform to add these elements when doing the JSON transform

 <?xml version="1.0" encoding="UTF-8" ?> <type>div</type> <content> <type>span</type> <content>Text2</content> </content> <content>Text2</content> 
+1
Oct 19 '12 at 7:19
source share

This one looks pretty good JSON for HTML and HTML for JSON https://github.com/andrejewski/himalaya

+1
Mar 03 '16 at 18:51
source share

There is a simple HTML to JSON converter . You can copy and paste the HTML code and click "Convert" to convert the HTML to JSON.

And there is a lot of Online HTML for JSON converters .

0
Jul 31 '17 at 4:26
source share

This might be useful - "XSLTJSON: Converting XML to JSON using XSLT", http://www.bramstein.com/projects/xsltjson/

-2
Oct. 19 '12 at 18:56
source share



All Articles