How to clear href using Python 3.5 and BeautifulSoup

I want to clear the href of each project from the website https://www.kickstarter.com/discover/advanced?category_id=16&woe_id=23424829&sort=magic&seed=2449064&page=1 using Python 3.5 and BeautifulSoup.

What is my code

#Loading Libraries
import urllib
import urllib.request
from bs4 import BeautifulSoup

#define URL for scraping
theurl = "https://www.kickstarter.com/discover/advanced?category_id=16&woe_id=23424829&sort=magic&seed=2449064&page=1"
thepage = urllib.request.urlopen(theurl)

#Cooking the Soup
soup = BeautifulSoup(thepage,"html.parser")


#Scraping "Link" (href)
project_ref = soup.findAll('h6', {'class': 'project-title'})
project_href = [project.findChildren('a')[0].href for project in project_ref if project.findChildren('a')]
print(project_href)
Run codeHide result

I get [None, None, ... None, None] back. I need a list with all the hrefs from the class.

Any ideas?

0
source share
1 answer

Try something like this:

import urllib.request
from bs4 import BeautifulSoup

theurl = "https://www.kickstarter.com/discover/advanced?category_id=16&woe_id=23424829&sort=magic&seed=2449064&page=1"
thepage = urllib.request.urlopen(theurl)

soup = BeautifulSoup(thepage)

project_href = [i['href'] for i in soup.find_all('a', href=True)]
print(project_href)

href. , href # . #.

project_href = [i['href'] for i in soup.find_all('a', href=True) if i['href'] != "#"]

, /discover?ref=nav, , , .

EDIT:

, :

soup = BeautifulSoup(thepage)
for i in soup.find_all('div', attrs={'class' : 'project-card-content'}):
    print(i.a['href'])
+1

Source: https://habr.com/ru/post/1649021/


All Articles