Std :: valarray and parallelization

Maybe this is such a stupid question.

On this site I read that

The valarray specification allows libraries to implement it with several performance optimizations, such as parallelizing certain operations.

What is happening with parallelization of std::valarray on different platforms and compilers? GCC, VS2010 / 2013, clang?

Specially with standard thread support from C++11 .

UPD And if some compilers do not support this function. What is the best way to do this: apply some function to container elements in multiple threads? Obviously, a naive solution would be short and work well with std::thread , but perhaps have a better solution?

+6
source share
1 answer

Intel seems to have done some work on this.

For others: I don’t think so. cppreference says that

Some implementations of the C ++ standard library use expression patterns to implement efficient operations on std :: valarray (for example, GNU libstdC ++ and LLVM libC ++). Valarms are only rarely optimized further, such as, for example, Intel Parallel Studio.

I also did not find any documentation that lib ++ or libstd ++ did something interesting in this regard, and usually no one hides interesting functions. :)

Given MSVC: I once found code using std::valarray , which was compiled but not connected because Microsoft had forgotten to implement some methods. This, of course, is not proof, but for me it does not look like there was anything healthy. I also could not find any documentation on special functions.

So what can we do instead?

For one, we can use parallel mode so libstdC ++ parallelizes the following algorithms with OpenMP, where he finds it useful:

 std::accumulate std::adjacent_difference std::inner_product std::partial_sum std::adjacent_find std::count std::count_if std::equal std::find std::find_if std::find_first_of std::for_each std::generate std::generate_n std::lexicographical_compare std::mismatch std::search std::search_n std::transform std::replace std::replace_if std::max_element std::merge std::min_element std::nth_element std::partial_sort std::partition std::random_shuffle std::set_union std::set_intersection std::set_symmetric_difference std::set_difference std::sort std::stable_sort std::unique_copy 

To do this, simply define _GLIBCXX_PARALLEL at compile time. I feel this covers a good piece of material that I would like to do with arrays of numbers. Sure,

Please note that the definition of _GLIBCXX_PARALLEL can change the size and behavior of standard class templates, such as std :: search, and therefore you can only link code compiled with parallel mode and code compiled without parallel mode if the container instance is not transferred between two units translation. The functionality of the parallel mode has a clear connection and cannot be mixed with the symbols of the normal mode.

(from here .)

Another tool that can help you parallelize is the Intel Advisor . It is more advanced and can also handle your loops, which I think (never used myself), but of course, this is proprietary software.

For linear algebra operations, you can also find a nice parallel implementation of LAPACK.

+6
source

Source: https://habr.com/ru/post/986452/


All Articles