The company I work for has several hundred very dynamic websites. He decided to build a search engine, and I was tasked with writing a scraper. Some of the sites run on old hardware and cannot take a lot of punishment, while others can handle a huge number of concurrent users.
I need to say that you need to use 5 parallel queries for site A, 2 for sites B and 1 for site C.
I know that for this I can use streams, mutexes, semaphores, etc., but it will be quite difficult. Are any of the higher level frameworks like TPL waiting / asynchronous, TPL Dataflow powerful enough to make this application an easier way?
source
share