How to write python script to search html site for link matching

I am not very familiar with python and have to write a script to execute many functions. Basically, the module that I still need is how to check the site code to match the links provided in advance.

+3
source share
3 answers

Relevant links what? Their attribute is HREF? Link display text? Perhaps something like:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

doc = urllib2.urlopen("http://somesite.com").read()
links = SoupStrainer('a', href=re.compile(r'^test'))
soup = [str(elm) for elm in BeautifulSoup(doc, parseOnlyThese=links)]
for elm in soup:
    print elm

This will grab the HTML content somesite.comand then parse it with BeautifulSoup, looking only for links whose HREF attribute begins with a “test”. He then creates a list of these links and prints them.

, -, documentation.

+5

, urllib, urllib2 (htmllib ..) Python. mechanize, curl .. HTML , BeautifulSoup.

+3

try scrapy, the most comprehensive web extraction infrastructure.

http://scrapy.org

0
source

Source: https://habr.com/ru/post/1735366/


All Articles