Development of a test solution (TDD) for web clips

Question

Development of a test solution (TDD) for web clips

Summary

I have Python based on a pet cleaning web project, which I am trying to implement in some TDDs, but I quickly ran into a problem. In unit tests, an Internet connection is required, as well as downloading html text. Although I understand that actual parsing can be done using a local file, some methods are used to simply redefine the URL and re-request the website. This seems to violate some of the best TDD practices (quote: “Clean Code” by Robert Martin claims that tests should be run in any environment). Although this is a Python project, I ran into a similar problem using R to scramble Yahoo Finance, and I'm sure these kinds of things are not language agnostics. At the very least, this problem seems to violate the basic guidance in TDD, which is that tests should run quickly.

TL; DR; Are there any recommendations for working with network connections in TDD?

Reproducible example

AbstractScraper.py

from urllib.request import urlopen
from bs4 import BeautifulSoup


class AbstractScraper:

    def __init__(self, url):
        self.url = url
        self.dataDictionary = None

    def makeDataDictionary(self):
        html = urlopen(self.url)
        text = html.read().decode("utf-8")
        soup = BeautifulSoup(text, "lxml")
        self.dataDictionary = {"html": html, "text": text, "soup": soup}

    def writeSoup(self, path):
        with open(path, "w") as outfile:
            outfile.write(self.dataDictionary["soup"].prettify())

TestAbstractScraper.py

import unittest
from http.client import HTTPResponse
from bs4 import BeautifulSoup
from CrackedScrapeProject.scrape.AbstractScraper import AbstractScraper
from io import StringIO


class TestAbstractScraperMethods(unittest.TestCase):

    def setUp(self):
        self.scraper = AbstractScraper("https://docs.python.org/2/library/unittest.html")
        self.scraper.makeDataDictionary()

    def test_dataDictionaryContents(self):
        self.assertTrue(isinstance(self.scraper.dataDictionary, dict))
        self.assertTrue(isinstance(self.scraper.dataDictionary["html"], HTTPResponse))
        self.assertTrue(isinstance(self.scraper.dataDictionary["text"], str))
        self.assertTrue(isinstance(self.scraper.dataDictionary["soup"], BeautifulSoup))
        self.assertSetEqual(set(self.scraper.dataDictionary.keys()), set(["text", "soup", "html"]))

    def test_writeSoup(self):
        filePath = "C:/users/athompson/desktop/testFile.html"
        self.scraper.writeSoup(filePath)
        self.writtenData = open(filePath, "r").read()
        self.assertEqual(self.writtenData, self.scraper.dataDictionary["soup"].prettify())

if __name__ == '__main__':
    suite = unittest.TestLoader().loadTestsFromTestCase(TestAbstractScraperMethods)
    unittest.TextTestRunner(verbosity=2).run(suite)

+4

python unit-testing testing web-scraping

Alex thompson Dec 25 '16 at 21:25

source share

1 answer

Dirk Herrmann · Answer 1 · 2017-10-18T21:51:10+0000

As you said, tests performed during TDD should be fast, and there are other aspects like deterministic, etc. (so if the connection is interrupted?). As mentioned in the comments, this usually implies that you should use mocks for these disturbing dependencies.

: , , , . ? , , - . , - , .

makeDataDictionary, . , , ( , , , ) : ? , ? ? : , , , .

, makeDataDictionary, ? ( ) (, ). , , makeDataDictionary, .

, , ( ) , . , _makeDataDictionary(html, text, soup), , {"html": html, "text": text, "soup": soup}. unit-testing _makeDataDictionary, makeDataDictionary. , makeDataDictionary .

: _makeDataDictionary . makeDataDictionary, mocks . , makeDataDictionary , makeDataDictionary , .

TDD- : TDD, , , . ( ) , . , , TDD .

Development of a test solution (TDD) for web clips

Summary

Reproducible example

More articles: