Your first option is usually preferable because it is much faster than the second method, it sends the request directly to the web server and returns a response. This is much more efficient than Internet Explorer automation (second option); IE automation is very slow, since you are actually just browsing the site - this will inevitably lead to more downloads, since it should load all the resources on the page - images, scripts, css files, etc. It will also run any Javascript on the page - all this is usually not useful, and you should wait for it to complete before parsing the page.
This, however, is a slightly double-edged sword - although it is slower if you are not familiar with html requests, automating Internet Explorer is much simpler than the first method, especially when elements are generated dynamically or there is confidence on the page with AJAX. Itβs also easier to automate IE when you need to access data on a site that requires a login, since it will process the appropriate cookies for you. This does not mean that web cleaning cannot be performed using the first method, and not for a deeper understanding of web technologies and site architecture.
The best option for the first method would be to use a different object to process the request and response, using the WinHTTP library, which is more robust than the MSXML library, and will usually process any cookies automatically.
Regarding data analysis, in your first approach, you used late binding to create an HTML object (htmlfile), while this reduces the need for a link, it also reduces functionality. For example, when using late binding, you do not take into account the added functions if the user has IE9 installed, in particular in this case the getElementsByClass function.
As a third option (and my preferred method):
Dim oHtml As HTMLDocument Dim oElement As Object Set oHtml = New HTMLDocument With CreateObject("WINHTTP.WinHTTPRequest.5.1") .Open "GET", "http://www.someurl.com", False .send oHtml.body.innerHTML = .responseText End With For Each oElement In oHtml.getElementsByClassName("imageElement") Debug.Print oElement.Children(0).src Next oElement 'IE 8 alternative 'For Each oElement In oHtml.getElementsByTagName("div") ' If oElement.className = "imageElement" Then ' Debug.Print oElement.Children(0).src ' End If 'Next oElement
This will require the installation of a Microsoft HTML Object Library
link - it will not work if the user has not installed IE9, but this can be processed and becomes less relevant.