12345 Hello I ...">

How to parse HTML using C ++ / Qt?

How can I parse the following HTML

<body> <span style="font-size:11px">12345</span> <a>Hello<a> </body> 

I would like to get the "12345" data from "span" with style = "font-size: 11px" from www.testtest.com, but I only need this data and nothing else.

How can i do this?

+6
source share
2 answers

EDIT: From Qt 5.6, post a blog post :

With 5.6, Qt WebKit and Qt Quick 1 will no longer be supported and will be removed from the release. The source code for these modules will continue to be available.

So, with Qt 5.6 - if you don't want to compile sources - QtWebKit no longer available. If you are using a Qt release older than 5.6, ready to compile QtWebKit , this may be useful; otherwise this answer is no longer valid .


It is hard to tell you exactly what needs to be done, since your explanation is incomplete in the use case. However, there are two ways to continue.

QtWebKit

If you already need any other functions from this module, this will not lead to any additional dependencies, and it will be most convenient for you.

You need to get https://doc.qt.io/archives/qt-5.5/qwebelement.html

This will happen when you find the first "span" element in html:

https://doc.qt.io/archives/qt-5.5/qwebframe.html#findFirstElement

Then you can simply get the text for this element using the appropriate QWebElement methods (s). For instances, you can use this to get the attribute value:

https://doc.qt.io/archives/qt-5.5/qwebelement.html#attribute

... but you can also request attribute names, as you can see in the documentation, etc.

Here's how you get the value 12345 :

https://doc.qt.io/archives/qt-5.5/qwebelement.html#toPlainText

XML parser in QtCore

If you don't need webkit for your sotware, and the html data comes in differently than directly from the Internet, for which you will need to use QWebKit, then you better use the xml parser available in QtCore. This may be so, even if you have no other dependency on QtWebKit, that this additional dependency will not cause any problems in your use case. It's hard to say based on your description. Of course, this would be less convenient, although not so much, in comparison with the webkit-based solution, which is intended for html.

What you need to avoid is QtXmlPatterns. This is unobtrusive software at the moment, and it will provide additional dependency for your code anyway.

+2
source

I think QXmlQuery is what you want. I think the code will look like

 QXmlQuery query; query.setQuery(html, QUrl("/body/span[@style='font-size:11p']")); QString r; query.evaluateTo(&r); 

You can also specify the URL directly in the request

 query.setQuery(QUrl("http://WWW.testtest.com"), QUrl("/body/span[@style='font-size:11p']")); 
+5
source

Source: https://habr.com/ru/post/953398/