Python retrieves value from url

I am trying to write a python script that checks money.rediff.com for a specific stock price and prints it. I know that this can be easily done using the API, but I want to find out how urllib2 works, so I try to do it the old way. But I was fixated on how to use urllib. Many online tutorials have asked me to “Check the item” for the value I need to return and split the string to get it. But all the examples in the video make it easy to separate the HTML tags, but I have something like this:

<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span> &nbsp; 
<span id="change" class="green">+0.50</span> &nbsp; 

<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>

I only need "6.66" in Line2. How should I do it? I am very new to Urllib2 and Python. All help would be greatly appreciated. Thanks in advance.

+4
source share
3 answers

You can do this with just urllib2and possibly a regular expression, but I would advise you to use the best tools, namely requestsand Beautiful Soup.

Here's the full quote program for Tata Motors Ltd.:

from bs4 import BeautifulSoup
import requests

html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content

soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())

print(quote)

EDIT

Here's a version of Python 2 using urllib2and re:

import re
import urllib2

html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()

quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))

print quote
+2
source

BeautifulSoup is good for parsing html

from bs4 import BeautifulSoup

##Use your urllib code to get the source code of the page
source = (Your get code here)
soup = BeautifulSoup(source)
##This assumes the id 'ltpid' is the one you are looking for all the time
span = soup.find('span', id="ltpid")
float(span.text)  #will return 6.66
+1
source

Use BeautifulSoup instead of regular expression to parse HTML.

+1
source

Source: https://habr.com/ru/post/1652543/


All Articles