Retrieve data from website using VBA excel multiple classname

I know that this was asked many times, but did not see a clear answer to the loop through the div and findind tags with the same class name.

My first question is:

If I have something like this:

<div id="carousel"> <div id="images"> <div class="imageElement"> <img src="img/image1.jpg"> </div> <div class="imageElement"> <img src="img/image2.jpg"> </div> <div class="imageElement"> <img src="img/image3.jpg"> </div> </div> </div> 

So, I want to get all img Src in div images along with other things in imageElement names and copy them to some cells in excel.

Second question: I saw two ways to pull web content using VBA using IE and other code using something but a browser.

 Private Sub pullData_Click() Dim x As Long, y As Long Dim htm As Object Set htm = CreateObject("htmlFile") With CreateObject("msxml2.xmlhttp") .Open "GET", "http://website.html", False .send htm.body.innerHTML = .responsetext End With End Sub 

And the second way:

 Set ie = New InternetExplorer With ie .navigate "http://eoddata.com/stockquote/NASDAQ/AAPL.htm" .Visible = False While .Busy Or .readyState <> READYSTATE_COMPLETE DoEvents Wend Set objHTML = .document DoEvents End With Set elementONE = objHTML.getElementsByTagName("TD") For i = 1 To elementONE.Length elementTWO = elementONE.Item(i).innerText If elementTWO = "08/10/12" Then MsgBox (elementONE.Item(i + 1).innerText) Exit For End If Next i DoEvents ie.Quit DoEvents Set ie = Nothing 

Which one is better and why?

So, if you can help me, I would appreciate it.

Thanks in advance.

+4
source share
2 answers

Your first option is usually preferable because it is much faster than the second method, it sends the request directly to the web server and returns a response. This is much more efficient than Internet Explorer automation (second option); IE automation is very slow, since you are actually just browsing the site - this will inevitably lead to more downloads, since it should load all the resources on the page - images, scripts, css files, etc. It will also run any Javascript on the page - all this is usually not useful, and you should wait for it to complete before parsing the page.

This, however, is a slightly double-edged sword - although it is slower if you are not familiar with html requests, automating Internet Explorer is much simpler than the first method, especially when elements are generated dynamically or there is confidence on the page with AJAX. It’s also easier to automate IE when you need to access data on a site that requires a login, since it will process the appropriate cookies for you. This does not mean that web cleaning cannot be performed using the first method, and not for a deeper understanding of web technologies and site architecture.

The best option for the first method would be to use a different object to process the request and response, using the WinHTTP library, which is more robust than the MSXML library, and will usually process any cookies automatically.

Regarding data analysis, in your first approach, you used late binding to create an HTML object (htmlfile), while this reduces the need for a link, it also reduces functionality. For example, when using late binding, you do not take into account the added functions if the user has IE9 installed, in particular in this case the getElementsByClass function.

As a third option (and my preferred method):

 Dim oHtml As HTMLDocument Dim oElement As Object Set oHtml = New HTMLDocument With CreateObject("WINHTTP.WinHTTPRequest.5.1") .Open "GET", "http://www.someurl.com", False .send oHtml.body.innerHTML = .responseText End With For Each oElement In oHtml.getElementsByClassName("imageElement") Debug.Print oElement.Children(0).src Next oElement 'IE 8 alternative 'For Each oElement In oHtml.getElementsByTagName("div") ' If oElement.className = "imageElement" Then ' Debug.Print oElement.Children(0).src ' End If 'Next oElement 

This will require the installation of a Microsoft HTML Object Library link - it will not work if the user has not installed IE9, but this can be processed and becomes less relevant.

+7
source

To print items on cells, replace:

 For Each oElement In oHtml.getElementsByClassName("imageElement") Debug.Print oElement.Children(0).src Next oElement 

FROM

 Dim wsTarget as Worksheet dim i as Integer i=1 set wsTarget=activeworkbook.worksheets("SomeSheet") For Each oElement In oHtml.getElementsByClassName("imageElement") wstarget.range("A" & i)=oElement.Children(0).src i=i+1 Next 

'Fixed syntax error for parameter

+2
source

Source: https://habr.com/ru/post/1501396/


All Articles