Problem:
My goal is to automate the clearing of the table with currency prices from this site stock prices . Since the stock broker does not provide an API, I have to find work around.
I have already searched for applications for this purpose, so as not to reinvent the wheel and waste time / money, but, unfortunately, I have not found any that will work with this site.
What I tried:
R is known for its simplicity and straightforward use. Let's look at the code, which is basically an example of copy-paste from a texbook:
library("rvest")
url <- "https://iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=120&date=2016-12-19-19-0"
population <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="mCSB_3_container"]/table') %>%
html_table()
population
population <- population[[1]]
head(population)
Getting an empty table.
JavaScript and casperJS
This option is by far the best, I can actually retrieve the data, but it is very slow and ultimately crashes "with exhausted memory":
var casper = require('casper').create({
logLevel:'debug',
verbose:true,
loadImages: false,
loadPlugins: false,
webSecurityEnabled: false,
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11"
});
var url = 'https://eu.iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=60&date=2016-12-19-21-0';
var length;
var fs = require('fs');
var sep = ';';
casper.start(url);
var today = new Date();
var dd = today.getDate();
var mm = today.getMonth()+1;
var hh = today.getHours();
var fff = today.getMilliseconds();
var MM = today.getMinutes();
var yyyy = today.getFullYear();
if(dd<10){
dd='0'+dd;
}
if(mm<10){
mm='0'+mm;
}
var today = yyyy +'_'+mm + '_' +dd + '_'+ hh +'_'+ MM +'_'+ fff;
casper.echo(today);
function getCellContent(row, cell) {
cellText = casper.evaluate(function(row, cell) {
return document.querySelectorAll('table tbody tr')[row].childNodes[cell].innerText.trim();
}, row, cell);
return cellText;
}
function moveNext()
{
var rows = casper.evaluate(function() {
return document.querySelectorAll('table tbody tr');
});
length = rows.length;
this.echo("table length: " + length);
};
for (var mins = 0; mins < 3; mins++)
{
url = 'https://eu.iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=60&date=2016-12-19-21-' + mins;
casper.echo(url);
casper.thenOpen(url);
casper.then(function() {
this.waitForSelector('#mCSB_3_container table tbody tr');
});
casper.then(moveNext);
casper.then(function() {
for (var i = 0; i < length; i++)
{
fs.write('prices_'+today+'.csv', getCellContent(i, 0) + sep + getCellContent(i, 1) + sep + getCellContent(i, 2) + sep + getCellContent(i, 4) + "\n", "a");
}
});
}
casper.run();
this.echo("finished with processing");
JavaScipt and PhantomJS
:
var webPage = require('webpage');
var page = webPage.create();
page.open('https://iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=120&date=2016-12-19-19-0', function(status) {
var title = page.evaluate(function() {
return document.querySelectorAll('table tbody tr');
});
});
Python BeautifulSoup
:
from bs4 import BeautifulSoup
from urllib2 import urlopen
url = "https://iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=120&date=2016-12-19-19-0"
soup = BeautifulSoup(urlopen(url), "lxml")
table = soup.findAll('table', attrs={ "class" : "quotes-table-result"})
print("table length is: "+ str(len(table)))
Scrapy
"Scrapy Shell", .
Pandas read_html()
Pandas :
ValueError: '. +'
:
import pandas as pd
import html5lib
f_states = pd.read_html("https://iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=120&date=2016-12-19-19-0")
print f_states
:
: , -, robots.txt, : Google-.