Retrieving Data from a Website

Therefore, the website constantly changes the data that it displays, and I want to receive this data every few seconds and register it in a spreadsheet. The problem is that in order to get to the page, I have to have the cookie that I get when I log in. Unfortunately, I know how to program in MATLAB. MATLAB has a function for this, urlread, but this does not apply to cookies. What can I do to get to this page? Can anyone help me with this? Point me in the direction where a programming noob like me can succeed.

+6
source share
3 answers

You can use wget to download content using HTTP cookies. I use StackOverflow.com as an example. Following are the following steps:

1) Get the wget command tool. For Mac or Linux, I think it is already available. On Windows, you can get it from the GnuWin32 project or from one of many other ports (Cygwin, MinGW / MSYS, etc.).

2) Next, we need to get an authenticated cookie by logging into the corresponding website. You can use your preferred browser for this.

In Internet Explorer, you can create it using File Menu> Import and Export> Export Cookies. In Firefox, I used the Cookie Exporter extension to export cookies to a text file. For Chrome there should be similar extensions

Obviously, you only need to take this step once before the cookies have expired!

3) After you find the cookie exported, we can use wget to get a web page and provide it to this cookie. This, of course, can be done from within MATLAB using the SYSTEM function:

 %# fetch page and save it to disk url = 'http://stackoverflow.com/'; cmd = ['wget --cookies=on --load-cookies=./cookies.txt ' url]; system(cmd, '-echo'); %# process page: I am simply viewing it using embedded browser web( ['file:///' strrep(fullfile(pwd,'index.html'),'\','/')] ) 

Parsing a web page is another topic that I will not go into. After getting the data you are looking for, you can interact with Excel spreadsheets using the XLSREAD and XLSWRITE functions.

4) Finally, you can write this in a function and execute it at regular intervals using the TIMER function

+5
source

Try using java.net classes. *.

You should be able to use them directly in the MATLAB workspace, as described here: http://www.mathworks.co.uk/help/techdoc/matlab_external/f4863.html

+1
source

Matlab has built-in features for web downloads. For http sites there is webread.m and websave.m. For FTP servers, there is mget.m

0
source

Source: https://habr.com/ru/post/897702/


All Articles