Planning system for fit

I would like to parallelize a linear operation (fitting a complex mathematical function to some data set) with several processors.

Suppose I have 8 cores on my machine, and I want to host 1000 datasets. I expect this is some kind of system that takes 1000 datasets as a queue and sends them to 8 cores for processing, so it starts with the first 8 out of 1000 being FIFOs. The selection time for each data set is generally different from the other, so some of the 8 data sets that are installed may take longer than others. What I want from the system is to save the results of the installed data sets, and then resume receiving new data sets from a large queue (1000 data sets) for each thread that is running. This should resume before processing just 1000 data sets. And then I could continue my program.

What is the name of such a system? and are there models for this in C ++?

I am in parallel with OpenMP and use advanced C ++ methods such as templates and polymorphism.

Thanks for any effort.

+1
source share
2 answers

You can use OpenMP parallel for dynamic schedules or OpenMP tasks. Both can be used to parallelize cases where each iteration requires a different amount of time to complete. Dynamically scheduled:

#pragma omp parallel { Fitter fitter; fitter.init(); #pragma omp for schedule(dynamic,1) for (int i = 0; i < numFits; i++) fitter.fit(..., &results[i]); } 

schedule(dynamic,1) forces each thread to perform one iteration at a time, and threads never remain in standby mode if iterations are no longer performed.

With tasks:

 #pragma omp parallel { Fitter fitter; fitter.init(); #pragma omp single for (int i = 0; i < numFits; i++) { #pragma omp task fitter.fit(..., &results[i]); } #pragma omp taskwait // ^^^ only necessary if more code before the end of the parallel region } 

Here, one of the threads executes a for loop, which creates 1000 OpenMP tasks. OMP tasks are queued and processed by simple threads. It is somewhat similar to dynamic for loops, but provides more freedom in code constructs (for example, with tasks that you can parallelize with recursive algorithms). The taskwait construct expects all pending tasks to complete. This is implied at the end of the parallel region, so it is really only necessary if more code follows the end of the parallel region.

In both cases, each call to fit() will be executed in a different thread. You must make sure that setting one set of parameters does not affect setting other sets, for example. that fit() is a thread safe method / function. Both cases also require that the fit() runtime far exceeds the overhead of the OpenMP constructs.

The OpenMP job requires a compiler compatible with OpenMP 3.0. This excludes all versions of MS VC ++ (even in VS2012) if you have to develop Windows.

If you want to have only one instance of a locksmith that has ever been initialized for each thread, then you should take a slightly different approach, for example. make the installer global and threadprivate :

 #include <omp.h> Fitter fitter; #pragma omp threadprivate(fitter) ... int main() { // Disable dynamic teams omp_set_dynamic(0); // Initialise all fitters once per thread #pragma omp parallel { fitter.init(); } ... #pragma omp parallel { #pragma omp for schedule(dynamic,1) for (int i = 0; i < numFits; i++) fitter.fit(..., &results[i]); } ... return 0; } 

Here fitter is a global instance of the fitter class. The omp threadprivate directive tells the compiler to put it in a local thread store, for example. to create a global variable. They persist between different parallel areas. You can also use omp threadprivate local variables on static . They are also saved between different parallel areas (but only in the same function):

 #include <omp.h> int main() { // Disable dynamic teams omp_set_dynamic(0); static Fitter fitter; // must be static #pragma omp threadprivate(fitter) // Initialise all fitters once per thread #pragma omp parallel { fitter.init(); } ... #pragma omp parallel { #pragma omp for schedule(dynamic,1) for (int i = 0; i < numFits; i++) fitter.fit(..., &results[i]); } ... return 0; } 

Calling omp_set_dynamic(0) disables dynamic commands, that is, each parallel region will always execute as many threads as specified by the OMP_NUM_THREADS environment OMP_NUM_THREADS .

+1
source

Basically, you want to create a pool of workers (or a pool of threads) that will execute a task from the queue, process it and continue working with another task. OpenMP provides various approaches to solving such problems, for example. barriers (all employees work up to a certain point and only when a certain requirement is met) or reduction to accumulate values ​​in a global variable after employees were able to calculate their corresponding parts.

Your question is very broad, but another hint I can give you is to take a look at the MapReduce paradigm. In this paradigm, a function is displayed above the data set, and the result is ordered into buckets, which are reduced using another function (which may again be the same function). In your case, this would mean that each of your processors / cores / nodes displays the specified function according to the data set assigned to it and sends the result buckets to another node responsible for combining it. I assume you need to look into MPI if you want to use MapReduce with C ++ and not use the specific MapReduce framework. When you run the program on one node, maybe you can do something similar with OpenMP, so searching the Internet can help.

TL DR search for a pool of workers (pool of threads), barriers and MapReduce.

+1
source

Source: https://habr.com/ru/post/1270935/


All Articles