How to convert a Jsoup document to a W3C document?

I am creating a Jsoup Document by parsing an internal HTML page,

public Document newDocument(String path) throws IOException { Document doc = null; doc = Jsoup.connect(path).timeout(0).get(); return new HtmlDocument<Document>(doc); } 

I would like to convert a Jsoup document to my org.w3c.dom.Document I used the available DOMBuilder library for this, but when parsing, I get org.w3c.dom.Document as null. I can not understand the problem, tried to search, but could not find the answer.

Code for creating a DOM W3C document:

 Document jsoupDoc=factory.newDocument("http:localhost/testcases/test_2.html")); org.w3c.dom.Document docu= DOMBuilder.jsoup2DOM(jsoupDoc); 

Can anyone help me with this?

+6
source share
2 answers

To get the jsoup document via HTTP , call Jsoup.connect(...).get() . To download the jsoup document locally , make a call to Jsoup.parse(new File("..."), "UTF-8") .

The DomBuilder call DomBuilder correct.

When you speak,

I used the available DOMBuilder library for this, but when parsing I get org.w3c.dom.Document as null.

I think you mean: "I used the available DOMBuilder library for this, but when I print the result, I get [#document: null] ." At least this was the result that I saw when I tried to print the w3cDoc object, but that does not mean that the object is null. I was able to go through the document by calling getDocumentElement and getChildNodes .

 public static void main(String[] args) { Document jsoupDoc = null; try { jsoupDoc = Jsoup.connect("http://stackoverflow.com/questions/17802445").get(); } catch (IOException e) { e.printStackTrace(); } org.w3c.dom.Document w3cDoc= DOMBuilder.jsoup2DOM(jsoupDoc); Element e = w3cDoc.getDocumentElement(); NodeList childNodes = e.getChildNodes(); Node n = childNodes.item(2); System.out.println(n.getNodeName()); } 
+6
source

Alternatively, Jsoup provides the W3CDom class using the fromJsoup method. This method converts a Jsoup document to a W3C document.

 Document jsoupDoc = ... W3CDom w3cDom = new W3CDom(); org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(jsoupDoc); 

UPDATE:

  • Starting with 1.10.3, W3CDom is no longer experimental .
  • Prior to Jsoup 1.10.2, the W3CDom class is still experimental.
+14
source

Source: https://habr.com/ru/post/950046/


All Articles