Extract data from an aspx page table using Excel VBA

I am trying to get table data from an aspx page using excel vba.I know how to get table data from a url, but below is the main problem.

Problem

There is an aspx page (say, www.abc.aspx). I am now on this page. Indicate this page page1 .

Now I click the page2 link on the current page. It should be noted that after clicking this link, the old URL (www.abc.aspx) does not change, but the contents change. (Content p. 2 )

If you are viewing the page1 source code, it has

<form method="post" action="page1 url" id="Form1"> 

Whatever the action on page1 (click on page2 ), it returns the same URL page1 .

So, how can I get page2 table data in excel VBA, since I don’t know its URL?

code

This is what I used to get the table data.

I used the Internet explorer object. Then he went to the link and saved the document in htmldoc.

 ie.navigate "url" Do While ie.READYSTATE <> READYSTATE_COMPLETE Application.StatusBar = "Fetching data..." DoEvents Loop Set htmldoc = ie.document 'Column headers Set eleColth = htmldoc.getElementsByTagName("th") j = 0 'start with the first value in the th collection For Each eleCol In eleColth 'for each element in the td collection ThisWorkbook.Sheets(1).Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time j = j + 1 'move to next element in td collection Next eleCol 'rinse and repeat 'Content Set eleColtr = htmldoc.getElementsByTagName("tr") 'This section populates Excel i = 0 'start with first value in tr collection For Each eleRow In eleColtr 'for each element in the tr collection Set eleColtd = htmldoc.getElementsByTagName("tr")(i).getElementsByTagName("td") 'get all the td elements in that specific tr j = 0 'start with the first value in the td collection For Each eleCol In eleColtd 'for each element in the td collection ThisWorkbook.Sheets(1).Range("D3").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time j = j + 1 'move to next element in td collection Next eleCol 'rinse and repeat i = i + 1 'move to next element in td collection Next eleRow 'rinse and repeat ie.Quit Set ie = Nothing 

EDIT:

Example

If we click on the questions in the stack overflow ( https://stackoverflow.com/a/166269/ ) and now click on page 2 of the questions (new link https://stackoverflow.com/questions/ ... ? Page = 2 & sort = newest)

In my case, if we press page2 , the new link will not be updated. This is the same old link.

EDIT . I found a similar question here.

How to get url hidden by javascript on external website?

Thanks.

+5
source share
2 answers

Well, I sympathize, there is a school of thought (including Tim Berners-Lee ) in which each individual page should have its own URIs and that they do not change .

But webmasters can and can hurt you. They can redirect your HTTP request and can confuse navigation, as in your case. They can rewrite HTTP requests.

You have two options.

Option 1 - let Internet Explorer allow you new content

So, if the content is displayed on the screen, it should be in the document object model (DOM). In IE or indeed in Chrome, you can right-click and get the context menu, and then select Inspect to see where the DOM element is.

I think your code shows enough experience for learning. However, sometimes some websites like to disable the Inspect menu option to avoid scrolling programmers. (EDIT: As in your case now that I read the comments)

Option 2 - Use an HTTP sniffing tool like Fiddler to detect HTTP redirects / rewrites

As I said above, HTTP requests can be rewritten and redirected by the web server, but the Fiddler protocol, today I discovered that there is a specific IE add-in Fiddler .

To be honest, although the developer tools that come with the browsers themselves, especially Chrome (Ctrl + Shift + I, then the Network tab) show network traffic to the level of detail, more and more like any sniffing tool.

Sorry, you hit - voted, this seems like a reasonable question.

+2
source

Bird's view of the problem :

You have a requirement that you can't seem to let go: Use Excel VBA. I emphasize this point because answers often provide solutions that satisfy alternative prerequisites from what is published in the OP.

Possible solution :

Therefore, you must use the Excel VBA interface with another tool capable of detecting the contents of html redirects or URL obfuscation.

The Google Chrome Developer Tools reveals all the content, and you can interact perfectly with Google Chrome using Excel VBA using Selenium VBA Wrapper . Download here .

It is quite versatile, for example, you can see how to clear web data .

Regarding getting obfuscated content, there are a few elements that may help.

how to get innerHTML of the whole page in selenium driver? (not VBA, but useful)

Selenium + VBA to control Chrome

(Note: the wrapper author usually wants to answer in SO and exactly in his answers).

I think, YMMV, there are always people trying to obfuscate their data , with various methods and often for good reasons ...

If you have a real example for http: //www.abc.aspx , this might help.

0
source

Source: https://habr.com/ru/post/1274267/


All Articles