By writing a high-performance crawler in C #, I can say that some permissions that explicitly control tens or hundreds of threads are not the best way. It can be done (I did it), but as a last resort it hurts.
That said.,.
If your application is written as I think, then each thread does something like this:
while (!Shutdown) { // get next url to crawl from somewhere // download the data from that url // do something with the data }
Pausing threads between downloads is quite simple. I would suggest creating two instances of ManualResetEvent
: one to continue and one to turn off. This is static
so that all crawler threads can access them:
static ManualResetEvent ShutdownEvent = new ManualResetEvent(false); static ManualResetEvent ContinueEvent = new ManualResetEvent(true);
Then each thread uses WaitAny
in a loop:
WaitHandle[] handles = new WaitHandle[] { ShutdownEvent, ContinueEvent }; while (true) { int handle = WaitHandle.WaitAny(handles); // wait for one of the events if (handle == -1 || handle >= handles.Length) { throw new ApplicationException(); } if (handles[handle] = ShutdownEvent) break; // shutdown was signaled if (handles[handle] == ContinueEvent) { // download the next page and do something with the data } }
Note that when I defined the handles
array, I first specified the ShutdownEvent
. The reason is that if multiple elements are WaitAny
, WaitAny
returns the lowest index corresponding to the signal object. If the array was filled in a different order, you cannot exit without first pausing.
Now, if you want the threads to shut down, call ShutdownEvent.Set
. And if you want the threads to pause, call ContinueEvent.Reset
When you want the threads to resume, call ContinueEvent.Set
.
A pause in the middle of loading is a bit more complicated. This can be done, but the problem is if you pause the server timeout too long. And then you have to restart the download from the beginning, or if the server and your code support it, restart the download from where you left off. Any option is quite painful, so I would not suggest trying to pause in the middle of the download.