Get an array of 2 lines from HTML ... using regex?

I am working on a personal project to automatically fill out the USPS Click and Ship form and then display Ref # and delivery confirmation #

So far, I have managed to complete the whole process, but I can’t understand for life how to pull out Ref # (which is my order #) and Delivery confirmation #

Basically, for each package, you print a shortcut for the HTML confirmation page, which is returned on the next page.

 <tr class="smTableText">
  <td style="border-top:solid 1px #AAAAAA; padding-bottom:4px;" valign="top">
    <table cellpadding="0" cellspacing="0" border="0" style="margin:7px 0px 0px 5px;">
      <tr> 
       <td valign="top" class="mainText" width=46>1 of 1</td>  
       <td valign="top" width=21><a href="javascript:toggleMoreInfo(0)" tabindex="19"><img src="/cns/images/common/button_plus.gif" height="11" width="11" border="0" hspace="0" vspace="0" id="Img1" style="margin-right:10px;" alt=""></a></td>  
       <td valign="top" width=203><div class="mainText" style="margin-bottom:10px; height:1em; overflow:hidden;" id="Div1">FIRSTLAST NAME<BR>STREET ADDRESS<BR>CITY, STATE  ZIP5-ZIP4<div class="smTableText">email@address.net<BR>Ref#: 100000000<BR></div> </div><div class="smTableText"></div> </td> 
      </tr>
    </table>
  </td> 
  <td style="border-top:solid 1px #AAAAAA; padding-bottom:4px; padding-top:7px;" valign="top" class="smTableText"><div id="Div2" style="margin-left:7px; height:2.4em; overflow:hidden;">&nbsp;Ship Date: 11/17/09<br>&nbsp;Weight: 0lbs 9oz<br>&nbsp;From: 48506<br></div></td>
  <td style="border-top:solid 1px #AAAAAA; padding-bottom:4px; padding-right:15px; padding-top:7px;" valign="top" align="right" class="smTableText"><div class="smTableText" id="Div3" style="height:2.4em; overflow:hidden; margin-bottom:3px;">Priority Mail                      <br>Delivery Confirm.<br></div> <span style="font-weight:bold;" class="smTableText">Label Total</span></td>
  <td style="border-top:solid 1px #AAAAAA; padding-bottom:4px; padding-right:15px; padding-top:7px;" valign="top" align="right" class="smTableText"><div class="smTableText" id="Div4" style="height:2.4em; overflow:hidden; margin-bottom:3px;">$4.80<br>$0.00<br></div><span class="smTableTextbold">$4.80</span></td>
</tr>
<tr class="smTableText"> <td colspan=4 style="height:20px;" valign="top"><div class="mainText" style="margin:0px; padding:4px 8px 0px 8px; display:block; border-top:solid 1px #AAAAAA;">Delivery Confirmation&#153; Label Number: <span class="mainTextbold">0000 1111 2222 3333 4444 55</span></div></td> </tr>

What I need to do is browse the entire page and find "Ref #:" to capture the next 9 characters. Then find the next one "Label Number: <span class="mainTextbold">"and write down the next 27 characters. Each pair of # links Label Number: <span class="mainTextbold">should be stored in an array.

, regex, , ? - , . VB.net, #, .

UPDATE: , XML, HTML- WebBrowser, .

, , ..... , , , , , ...

# 2 , , . , , 2 . , .

   'Sub getdeliverynum(ByVal sText As String)
Sub getdeliverynum()
    Me.MainTabControl.SelectedTab = USPSsiteTAB
    WebBrowser1.Navigate("http://www.vaporstix.com/usps.html")
    While Not WebBrowser1.ReadyState = WebBrowserReadyState.Complete
        Application.DoEvents()
    End While
    Dim input As String = WebBrowser1.DocumentText
    Dim pattern As String = "Ref#: ([^<]+)[\S\s]*?Label Number: <span class=""mainTextbold"">([^<]+)"

    For Each match As Match In Regex.Matches(input, pattern)
        Dim instance As Double
        Dim ref As String = ""
        Dim track As String = ""
        instance = 0
        For Each group As Group In match.Groups
            instance = instance + 1
            If instance = 1 Then
                'do nothing this is the full string.... 
            ElseIf instance = 2 Then
                ref = group.Value
            ElseIf instance = 3 Then
                track = group.Value
            End If
        Next
        'replace with insert to db... this is for testing.
        MsgBox("Ref: " + ref + vbCrLf + "Confirmation: " + track)
    Next

End Sub
+3
3

, "" HTML , , , :

Ref#: (.{9})[\S\s]*?Label Number: <span class="mainTextbold">(.{27})

Backreference \1 9 Ref#:, \2 27 Label number...

, ,

Ref#: ([^<]+)[\S\s]*?Label Number: <span class="mainTextbold">([^<]+)

, , . , . , :

Ref#: ([^<]++)[\S\s]*?Label Number: <span class="mainTextbold">([^<]++)

:

  • / - , ,
  • , .
  • , . , , XML-.
+1

System.xml . Xpath XmlDocument , .

Dim xpathDoc As XPathDocument
Dim xmlNav As XPathNavigator

Dim xmlNI As XPathNodeIterator
xpathDoc = New XPathDocument("c:\builder.xml")
xmlNav = xpathDoc.CreateNavigator()
xmlNI = xmlNav.Select("//span[@class='mainTextbold']")
While (xmlNI.MoveNext())
    System.Console.WriteLine(xmlNI.Current.Name + " : " + xmlNI.Current.Value)
End While

, XmlDocument

Xpath, span[@class='mainTextbold'], .

, XHTML, XHTML, TidyNet, .

+2

Regarding your updated value output question:

For Each match As Match In Regex.Matches(input, pattern)
    Dim ref As String = match.Groups(1).Value
    Dim track As String = match.Groups(2).Value

    ' replace with insert to db... this is for testing.
    MsgBox("Ref: " + ref + vbCrLf + "Confirmation: " + track)
Next

(unverified)

0
source

Source: https://habr.com/ru/post/1723435/


All Articles