Running 1000+ pages / min in a browser environment

How can I load and execute (e.g. evaluate Javascript, build DOM) more than 1000 XHTML documents per minute?

Some contours / limitations:

  • The URLs you need to download are on different servers.
  • I need to go through - and ideally change the resulting DOM.
  • There is no interest in rendering graphics.
  • Bandwidth is not a problem.
  • Excessively massive hardware parallelization will be a problem.
  • The production environment is .NET.

I'm not really worried about page loading. I believe that page exclusion is actually a bottleneck..NET has a built-in web browser object, but I have no idea if it will scale on one machine. In addition, .NET is not an absolute requirement, but will simplify integration here.

I would be grateful for any comments / pointers regarding:

  • Which browser API is most suitable for this?
  • Is the browser the right way - maybe there is an easier way to execute Javascript, which is the most important part (... but does not provide the DOM)?
  • What existing products / services - whether they are open source or commercial - can fulfill the task?
  • Approximately how many pages per minute can I expect to process on one machine (3 ms chrome advertising)?
  • Any errors you may encounter ...

Thanks in advance,

/ David

+3
source share
3 answers

Look at one of the headless browsers for .NET - they will be faster than BrowserControlbecause they should not display a graphical representation.

, 1000 , , .

.

- HtmlUnit .

SO .

+4

, WinForms, ~ 7800 URL- 5 ( URL-, , , , , .

, 26 30 , , TPL ( .NET v4.0), 5. Dell T7500 Xeon (3 ), 24 64- Windows 7 Ultimate.

WebClient, Stream StreamReader Parallel.ForEach(), .

, , , , , " 1000 /" [ ].

...

+1

, node.js , .net. dom.

0

Source: https://habr.com/ru/post/1789258/


All Articles