Reading and managing HTML with Excel VBA

Let's say I have a page as follows saved in c: \ temp \ html_page.html:

<html>
   <head>
      <link rel="stylesheet" href="styles.css">
   </head>
   <body>
      <div id="xxx1">
         <img src="test.png">
      </div>
   </body>
</html>

I would like to programmatically configure the src attribute for img based on Excel and VBA data. Basically, a way to find a div with Xpath and configure the (single) img tag that it contains.

I found an example for manipulating XML using VBA through the XML library here , but I crunched my head around creating this work with an HTML object library; cannot find any examples and / or documentation.

Dim XDoc As Object, root As Object

Set XDoc = CreateObject("MSXML2.DOMDocument")
XDoc.async = False: XDoc.validateOnParse = False

If XDoc.Load(html_path) Then
    Debug.Print "Document loaded"
Else
    Dim strErrText As String
    Dim xPE As MSXML2.IXMLDOMParseError
    ' Obtain the ParseError object
    Set xPE = XDoc.parseError
    With xPE
       strErrText = "Your XML Document failed to load" & _
         "due the following error." & vbCrLf & _
         "Error #: " & .ErrorCode & ": " & xPE.reason & _
         "Line #: " & .Line & vbCrLf & _
         "Line Position: " & .linepos & vbCrLf & _
         "Position In File: " & .filepos & vbCrLf & _
         "Source Text: " & .srcText & vbCrLf & _
         "Document URL: " & .URL
    End With
    MsgBox strErrText, vbExclamation

All I want to do is:

'...
Set outer_div = XDoc.SelectFirstNode("//div[id='xxx1'")
... edit the img attribute

But I can not load the HTML page because it is not valid XML (the img tag is not closed).

Any help is appreciated. Oh, and I can't use other languages ​​like Python, bummer.

+4
2

, , . XML HTML:

Sub changeImg()

    Dim dom As Object
    Dim img As Object
    Dim src As String

    Set dom = CreateObject("htmlFile")

    Open "C:\temp\test.html" For Input As #1
        src = Input$(LOF(1), 1)
    Close #1

    dom.body.innerHTML = src

    Set img = dom.getelementsbytagname("img")(0)

    img.src = "..."

    Open "C:\temp\test.html" For Output As #1
        Print #1, dom.DocumentElement.outerHTML
    Close #1


End Sub

, Head, . , .

, - , . HTML - HTML Object Library:

Sub changeImg()

    Dim dom As HTMLDocument
    Dim img As Object
    Dim src As String

    Set dom = CreateObject("htmlFile")

    Open "C:\temp\test.html" For Input As #1
        src = Input$(LOF(1), 1)
    Close #1

    dom.body.innerHTML = src

    Set img = dom.getelementsbytagname("img")(0)

    img.src = "..."

    Open "C:\temp\test.html" For Output As #1
        Print #1, dom.DocumentElement.outerHTML
    Close #1


End Sub
+3

doc.querySelector("div[id='xxx1'] img"). src img.setAttribute "src", "new.png".

Option Explicit

' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library

Sub Demo()
    Dim ie As SHDocVw.InternetExplorer
    Dim doc As MSHTML.HTMLDocument
    Dim url As String

    url = "file:///C:/Temp/StackOverflow/html/html_page.html"
    Set ie = New SHDocVw.InternetExplorer
    ie.Visible = True
    ie.navigate url
    While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    Set doc = ie.document

    Dim img As HTMLImg
    Set img = doc.querySelector("div[id='xxx1'] img")
    If Not img Is Nothing Then
        img.setAttribute "src", "new.png"
    End If
    ie.Quit
End Sub
0

Source: https://habr.com/ru/post/1661938/


All Articles