Rcpp with the critical OpenMP directive is much slower than compiled C ++ code

As the name says, using the directive #pragma omp criticalin an R-package with Rcpp significantly slows down the execution compared to the compiled and executed C ++ code used in the R-package due to the fact that all the processor power is not used.

Consider a simple C ++ program (with cmake):

test.h like:

#ifndef RCPP_TEST_TEST_H
#define RCPP_TEST_TEST_H

#include <limits>
#include <cstdio>
#include <chrono>
#include <iostream>
#include <omp.h>

namespace rcpptest {
    class Test {
    public:
        static unsigned int test();
    };
}

#endif //RCPP_TEST_TEST_H

implementation of test.h in test.cpp:

#include "test.h"

namespace rcpptest {
    unsigned int Test::test() {
        omp_set_num_threads(8);
        unsigned int x = 0;

        std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

#pragma omp parallel for
        for (unsigned int i = 0; i < 100000000; ++i) {

#pragma omp critical
            ++x;
        }
        std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
        std::cout << "finished (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() <<std::endl;

        return x;
    }
}

and main:

#include "src/test.h"

int main() {
    unsigned int x = rcpptest::Test::test();
    return 0;
}

If I create and run this program in the IDE (CLion), everything works as excluded.

Then I created an R package using Rcpp:

library(Rcpp)
Rcpp.package.skeleton('rcppTestLib')

and used the SAME C ++ source code for the package + the “Rcpp” file to export my test function for use with R (rcppTestLib.cpp):

#include <Rcpp.h>
#include "test.h"

// [[Rcpp::export]]
void rcppTest() {
    rcpptest::Test::test();
}

If I then ran a test from R using the package

library(rcppTestLib)
rcppTest()

execution is much slower.

, ++, Rcpp, :

   program   | execution time
-----------------------------
compiled c++ | ~7 200ms
Rcpp package | ~551 000 ms

, Rcpp 8 , ~ 1% , ++ 8 , .

#pragma omp critical #pragma omp atomic :

   program   | execution time
-----------------------------
compiled c++ | ~2 900ms
Rcpp package | ~3 300 ms

#pragma omp atomic Rcpp 8 . - , , .

, : #pragma omp critical R/Rcpp , #pragma omp atomic , CLION, BOTH?

?

+4
2

:

  • OpenMP src/Makevars (unix) src/Makevars.win (windows)
  • num_threads(x) critical

, src/Makevars src/Makevars.win:

PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CFLAGS)
PKG_CFLAGS = $(SHLIB_OPENMP_CFLAGS)
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)

. https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support


num_threads(x)... ...

:

#pragma omp parallel for

#pragma omp parallel for num_threads(4)

:

finished (ms): 30822
[1] 1e+08

.

finished (ms): 17979
[1] 1e+08

1,7 . - cmake .

omp_set_num_threads(x)

set OMP_NUM_THREADS=x

https://gcc.gnu.org/onlinedocs/libgomp/omp_005fset_005fnum_005fthreads.html

https://software.intel.com/en-us/mkl-linux-developer-guide-setting-the-number-of-threads-using-an-openmp-environment-variable

+3

@coatless . src/Makevars*, , OpenMP. :

ccache g++ -I/usr/share/R/include -DNDEBUG  -I"/usr/local/lib/R/site-library/Rcpp/include"    -fpic  -g -O3 -Wall -pipe   -march=native -c test.cpp -o test.o
test.cpp:10:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for

test.cpp:13:0: warning: ignoring #pragma omp critical [-Wunknown-pragmas]
 #pragma omp critical

src/Makevars , . htop , .

- , . . , , , OMP_NUM_THREADS=2 , OMP_NUM_THREADS=3 OMP_NUM_THREADS=4 - , , , .

+2

Source: https://habr.com/ru/post/1694049/


All Articles