Read webpage text in python

Question

Read webpage text in python

I know this question or similar has already been asked. But those that I found did not give me the correct answer, so I ask here.

How can I get the text of an HTML site and which can I use to compare with other given values?

Suppose I have this web page:

<html>
<head>
<title>This is my page</title>

<center>
<div class="mon_title">Some title here</div>
<table class="mon_list" >
<tr class='list'><th class="list" align="center"></th><th class="list" align="center">Set 1</th><th class="list" align="center">Set 2</th><th class="list" align="center">Set 4</th><th class="list" align="center">Set 5</th><th class="list" align="center">Set 6</th><th class="list" align="center">Set 7</th><th class="list" align="center">Set 8</th><th class="list" align="center">Set 9</th><th class="list" align="center">Set 10</th><th class="list" align="center">Set 11</th><th class="list" align="center">Set 12</th></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
</table>

Sorry for any typos or missing parts. I hope you understand the essence of the page. So, now my program should read if some of the setpoints outside the table are the same as the data, such as "Is Value 2 somewhere in it?" and if in fact he should ask: "Value 5 on the same line?"

Is it even possible? How much effort will it take to create a program?

All I got is loading the actual full HTML page with this code in python:

import requests

url = 'http://some.random.site.com/you/ad/here'
print (requests.get(url).text)

HTML, . , , CTRL + A - + .

PS: , , - , . , , ...

+4

python html

Leo Lion 16 . '17 15:42

3

, . .

+2

silvanoe 16 . '17 15:57

import requests
from bs4 import BeautifulSoup as soup
url = 'http://some.random.site.com/you/ad/here'
text=soup(requests.get(url).text)
text=text.find(class_='mon_list')
listy=[]
rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    listy.append([elem.get_text() for elem in cols])
print(listy)

:

[[], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12'], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']]

+1

ᴡʜᴀᴄᴋᴀᴍᴀᴅᴏᴏᴅʟᴇ3000 16 . '17 15:51

Ajax1234 · Accepted Answer · 2017-08-16T15:48:05+0000

urllib re, :

import urllib.request
import re

data = str(urllib.request.urlopen(url).read())

values = re.findall("Value \d+", data)

:

['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']

Read webpage text in python

More articles: