Usually, the Internet Explorer COM object is used for this:
root = "C:\base\dir" Set ie = CreateObject("InternetExplorer.Application") For Each f In fso.GetFolder(root).Files ie.Navigate "file:///" & f.Path While ie.Busy : WScript.Sleep 100 : Wend text = ie.document.getElementById("MySection").innerText WScript.Echo Replace(text, vbNewLine, "") Next
However, the <section> not supported until IE 9, and even in IE 9 the COM object does not seem to process it correctly, since getElementById("MySection") returns only the opening tag:
>>> wsh.echo ie.document.getelementbyid("MySection").outerhtml <SECTION id=MySection>
Instead, you can use a regex:
root = "C:\base\dir" Set fso = CreateObject("Scripting.FileSystemObject") Set re1 = New RegExp re1.Pattern = "<section id=""MySection"">([\s\S]*?)</section>" re1.Global = False re2.IgnoreCase = True Set re2 = New RegExp re2.Pattern = "(<br>|\s)+" re2.Global = True re2.IgnoreCase = True For Each f In fso.GetFolder(root).Files html = fso.OpenTextFile(filename).ReadAll Set m = re1.Execute(html) If m.Count > 0 Then text = Trim(re2.Replace(m.SubMatches(0).Value, " ")) End If WScript.Echo text Next
source share