Multiple threads

I will talk a lot about multithreading, and the large stackoverflow community has helped me a lot to understand multithreading.

All the examples that I saw on the Internet relate to only one thread.

My application is a scraper for an insurance company (family company ... all for free). In any case, the user can choose how many threads they want to start. So let's say, for example, the user wants the application to clear 5 sites at a time, and then on the same day he used up 20 threads because his computer does nothing, so he has resources to save.

Basically, the application creates a list of 1000 sites for cleaning. The thread shuts down and does this, updates the user interface and builds a list.

When this is over, another thread is called to start the cleanup. Depending on the number of threads that the user has installed for use, x the number of threads will be created.

What is the best way to create these threads? Should I create 1000 threads in a list. And swipe through them? If the user has set 5 threads to run, he will go through 5 at a time.

I understand threads, but this is the application logic that catches me.

Any ideas or resources on the internet that can help me?

+4
source share
9 answers

You can use the thread pool for this:

using System; using System.Threading; public class Example { public static void Main() { ThreadPool.SetMaxThreads(100, 10); // Queue the task. ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc)); Console.WriteLine("Main thread does some work, then sleeps."); Thread.Sleep(1000); Console.WriteLine("Main thread exits."); } // This thread procedure performs the task. static void ThreadProc(Object stateInfo) { Console.WriteLine("Hello from the thread pool."); } } 
+3
source

This scraper, does it use a lot of processor when it starts?

If he communicates a lot with these 1000 remote sites, loading their pages, it may take longer than the actual analysis of the pages.

And how many processor cores does your user have? If they have 2 (which is common these days), then for two simultaneous threads performing analysis, they will not see any speed.

Thus, you probably need to “parallelize” page loading. I doubt you need to do the same for page analysis.

Take a look at asynchronous IO instead of explicit multithreading. It allows you to run a bunch of downloads in parallel, and then get a callback when each of them completes.

+2
source

I think you mainly need this example.

 public class WebScraper { private readonly int totalThreads; private readonly List<System.Threading.Thread> threads; private readonly List<Exception> exceptions; private readonly object locker = new object(); private volatile bool stop; public WebScraper(int totalThreads) { this.totalThreads = totalThreads; threads = new List<System.Threading.Thread>(totalThreads); exceptions = new List<Exception>(); for (int i = 0; i < totalThreads; i++) { var thread = new System.Threading.Thread(Execute); thread.IsBackground = true; threads.Add(thread); } } public void Start() { foreach (var thread in threads) { thread.Start(); } } public void Stop() { stop = true; foreach (var thread in threads) { if (thread.IsAlive) { thread.Join(); } } } private void Execute() { try { while (!stop) { // Scrap away! } } catch (Exception ex) { lock (locker) { // You could have a thread checking this collection and // reporting it as you see fit. exceptions.Add(ex); } } } } 
+1
source

If you really want an application, use what someone else has already spent on developing and improving:

http://arachnode.net/

arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing, and storing Internet content, including email addresses, files, hyperlinks, images, and web pages.

Interested or involved in screen scripting, data mining, text mining, research or any other application in which a high-performance scanning application is the key to the success of your efforts, arachnode.net provides the solution you need for success.

If you also want to write one because it is a fun thing to write (I wrote one recently, and yes, it is a lot of fun), then you can refer to this pdf provided by arachnode.net, which really explains in detail the theory, underlying good web crawler:

http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1

Download a pdf document called "Bypassing the Internet" (second link above). Go to section 2.6, entitled "2.6 Multithreaded Scanners." This is what I used to create my finder, and I must say, I think it works well.

+1
source

The main logic:

You have one queue in which you put the URLs to clear, then you create your threads and use the queue object that each thread has access to. Let the threads start the loop:

  • block the queue
  • check if there are elements in the queue, if not, unlock the queue and end of thread
  • delete the first item in the queue
  • unlock the queue
  • process element
  • raise an event that updates the user interface (remember to lock the user interface controller)
  • return to step 1

Just let Threads do the “get stuff from the queue” part (pulling tasks) instead of giving them URLs (clicking tasks), so you just say

YourThreadManager.StartThreads (numberOfThreadsTheUserWants);

and everything else happens automatically. See Other Answers for how to create and manage threads.

0
source

I solved a similar problem by creating a working class that uses a callback to signal to the main application that the work is done. Then I create a queue of 1000 threads, and then I call a method that starts the threads until the current thread reaches the limit, tracking active threads with the dictionary associated with the ManagedThreadId thread. As each thread completes, the callback removes the thread from the dictionary and invokes the thread's launcher.

If the connection is dropped or time runs out, the callback again inserts the thread back into the queue. Lock around the queue and dictionary. I create threads using a thread pool, because the overhead of creating a thread is not significant compared to the connection time, and this allows me to have more threads in flight. The callback also provides a convenient place to update the user interface, even allowing you to change the flow restriction when it starts. Once I had more than 50 open connections. Remember to increase the MacConnections property in app.config (two by default).

0
source

Consider using an event-based asynchronous template (AsyncOperation and AsyncOperationManager classes)

0
source

I would use the queue and the condition variable and the mutex and start only the requested number of threads, for example, 5 or 20 (and not start 1000).

Each thread blocks a condition variable. When he wakes up, he deactivates the first element, unlocks the queue, works with the element, blocks the queue and checks for the presence of more elements. If the queue is empty, populate the condition variable. If not, open, run, repeat.

While the mutex is locked, it can also check if the user has requested the number of threads. Just check if there is count> max_count, and if so, the thread terminates itself.

At any time, when you have more sites in the queue, just block the mutexes and add them to the queue, and then pass the condition variable. Any threads that are not yet running will wake up and do new work.

Each time the user increases the requested thread counter, just start them, and they will block the queue, conduct a work check and either sleep on the condition variable or get an exit.

Each thread will constantly pull more work out of line or sleep. You do not need more than 5 or 20.

0
source

You might want to take a look at the ProcessQueue article of the CodeProject article.

In fact, you will want to create (and start) the number of threads that are suitable, in your case this number comes from the user. Each of these threads must process the site, and then find the next site needed for processing. Even if you don’t use the object itself (although it sounds like it suits your goals pretty well, although I am clearly biased!), It should give you a good idea of ​​how this will be done.

-1
source

Source: https://habr.com/ru/post/1300650/


All Articles