I am trying to clean this site here:
However, scrolling down is required to collect additional data. I have no idea how to scroll down using a beautiful soup or python. Does anyone know about this?
The code is a bit of a mess, but here it is.
import scrapy from scrapy.selector import Selector from testtest.items import TesttestItem import datetime from selenium import webdriver from bs4 import BeautifulSoup from HTMLParser import HTMLParser import re import time class MLStripper(HTMLParser): class MySpider(scrapy.Spider): name = "A1Locker" def strip_tags(html): s = MLStripper() s.feed(html) return s.get_data() allowed_domains = ['https://www.a1lockerrental.com'] start_urls = ['http://www.a1lockerrental.com/self-storage/mo/st- louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units? category=all'] def parse(self, response): url='http://www.a1lockerrental.com/self-storage/mo/st- louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units? category=Small' driver = webdriver.Firefox() driver.get(url) html = driver.page_source soup = BeautifulSoup(html, 'html.parser') url2='http://www.a1lockerrental.com/self-storage/mo/st-louis/4427- meramec-bottom-rd-facility/unit-sizes-prices#/units?category=Medium' driver2 = webdriver.Firefox() driver2.get(url2) html2 = driver.page_source soup2 = BeautifulSoup(html2, 'html.parser')
The desired code output is to display the data collected from this web page: http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-meramec-bottom-rd-facility/unit- sizes-prices # / units? category = all
To do this, scroll down to view the rest of the data. At least that's how it will be in my mind.
Thanks DM123
source share