Scraping page to get prices from google finance

I am trying to get stock prices by scraping Google finance pages, I do this in python using the urllib package and then using regex to get price data.

When I leave my python script, it runs initially for some time (several minutes), and then throws an exception exception [HTTP Error 503: Service Unavailable]

I assume this is happening because on the web server side it detects frequent page updates as a robot and throws this exception after some time.

is there a way, for example, to delete a cookie or create some cookie, etc.

or even better, if google gives some api, I want to do it in python, because the complete application is in python, but if there is nothing available in python, I can consider alternatives. This is my python method that I use in a loop to get data (with a few seconds of sleep, I call this method in a loop)

def getPriceFromGOOGLE(self, symbol): """ gets last traded price from google for given security """ toReturn = 0.0 try: base_url = 'http://google.com/finance?q=' req = urllib2.Request(base_url + symbol) content = urllib2.urlopen(req).read() namestr = 'name:\"' + symbol + '\",cp:(.*),p:(.*),cid(.*)}' m = re.search(namestr, content) if m: data = str(m.group(2).strip().strip('"')) price = data.replace(',','') toReturn = float(price) else: print 'ERROR ' + str(symbol) + ' --- ' + str(content) except Exception, exc: print 'Exc: ' + str(exc) finally: return toReturn 
+4
source share
4 answers
+2
source

The question is quite old, but the selected answer is no longer valid. API is deprecated.

There is an open source project to clear all of Google’s finances and match them to their current price http://scrape-google-finance.compunect.com/
The project solves most issues, includes caching, IP management and works stably without blocking.
It uses an internal finance company matching api to clear companies and api schedule to get prices. However, this is PHP code, not python. You can still find out how he solves problems and adapts them.

+5
source

To get around most speed limits or detect bots from the likes of Google or Wikipedia or Yahoo, trick your user agent.

This will cause your script requests to appear from the latest version of Google Chrome.

 headers = {'User-Agent' : "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.16 Safari/534.24"} req = urllib2.Request(url,None,headers) content = urllib2.urlopen(req).read() 
+3
source

Yahoo Finance is also a good place to get financial information that spans more countries and stocks.

For python 2 you can use ystockquote . For python 3, you can use yfq , which I rewrite from the previous one.

Get current quotes from Google and Intel.

 >>> import yfq >>> yfq.get_price('GOOG+INTL') {'GOOG': '600.25', 'INTL': '22.25'} 

Get historical Yahoo quotes from March 3, 2012 to March 5, 2012.

 >>> yfq.get_historical_prices('YHOO','20120301','20120303') [['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], ['2012-03-02', '14.89', '14.92', '14.66', '14.72', '9164900', '14.72'], ['2012-03-01', '14.89', '14.96', '14.79', '14.93', '12283300', '14.93']] 
+3
source

Source: https://habr.com/ru/post/1347742/


All Articles