Any experiences with Intel Threading Building Blocks?

The Intel Threading Building Blocks (TBB) open source library looks really interesting. Even if there is even an O'Reilly Book about an item, I don’t hear about how many people use it. I am interested in using it for several multi-level parallel applications (MPI + threads) in Unix environments (Mac, Linux, etc.). For what it's worth, I'm interested in high-performance computational / numerical methods of application types.

Does anyone have any experience with TBB? Does this work well? Is it portable enough (including GCC and other compilers)? Does the paradigm work well for the programs you wrote? Are there any other libraries I should look at?

+24
c ++ multithreading intel tbb
Sep 20 '08 at 2:52
source share
10 answers

I entered it into our code base because we needed to use bettor malloc when we switched to a 16-core machine. With 8 and under it there was no significant problem. It worked well for us. Then we plan to use fine-grained parallel containers. Ideally, we can use the real meat of the product, but this requires a rethinking of how we build our code. I really like the ideas in TBB, but this is not easy to modify based on code.

You cannot think of TBB as another stream library. They have a completely new model that really sits on top of the threads and abstracts the threads. You will learn to think in a task, parallel_for type operations and pipelines. If I were to build a new project, I would probably try to model it this way.

We work in Visual Studio and everything works fine. It was originally written for linux / pthreads, so it also works great there.

+12
Sep 20 '08 at 13:34
source share

I do not do numerical calculations, but I work with data mining (I think clustering and classification), and our workloads are probably similar: all the data is static, and you have it at the beginning of the program. I briefly researched Intel TBB and found them redundant for my needs. After starting with pthread-based source code, I switched to OPENMP and got the right combination between readability and performance.

+5
Sep 26 '08 at 17:20
source share

ZThread - LGPL, you can use the library in dynamic linking if you are not working in an open source project.

The Threading Building (TBB) blocks in the open source version (there is a new commercial version, $ 299, I don’t know the differences) is the GNU General Public License version 2 with the so-called "Runtime Exception" (which is typical for use only when creating free software.) I saw other Runtime exceptions that try to get closer to LGPL, but for commercial use and static binding this is not now .

I am only writing this because I took the opportunity to study library licenses, and they should also be considered for the selection based on the use that they intend to give.




Txs, Jihn to indicate this update ...

+3
Sep 20 '08 at 5:55
source share

I have used TBB briefly and will probably be using it more in the future. I liked using it, most importantly, because you do not need to deal with C ++ macros / extensions, but they remain within the language. Also its quite portable. I used it both in windows and in linux. One thing: hard work with threads using TBB, you have to think in terms of tasks (which is actually good). Intel TBB will not support your use of bare castles (this will make it tedious). But overall, this is my preliminary experience.

I also recommend taking a look at openMP 3 too.

+3
Sep 20 '08 at 6:06
source share

I studied TBB but never used it in a project. I did not see the benefits (for my purposes) over ZThread . A brief and somewhat dated review can be found here .

It is quite augmented by several options for sending streams, all the usual synchronization classes and a very convenient exception-based interrupt mechanism. It is easy to expand, well written and documented. I used it in 20 projects. It also works great with any * NIX that supports POSIX streams as well as Windows.

Worth a look.

+2
Sep 20 '08 at 5:36
source share

I use TBB in one project. It seemed easier to use than threads. There are tasks that can be run in parallel. A task is just a call to your parallel routine. Load balancing is performed automatically. That's why I take it as a parallelization library of a higher level. I achieved 2.5x speed without much work with the 4-core Intel processor. There are examples, they answer questions on forums and are supported, and it's free.

+2
Dec 02 '09 at 21:35
source share

Portability

TBB is portable. It supports Intel and AMD processors (e.g. x86), IBM PowerPC and POWER processors, ARM processors, and possibly others. If you look in the build directory , you will see all build system support configurations that include a wide range of operating systems (Linux, Windows, Android, MacOS, iOS, FreeBSD, AIX, etc.) and compilers (GCC, Intel, Clang / LLVM, IBM XL, etc.). I have not tried TBB with the C ++ PGI compiler and I know that it does not work with the Cray C ++ compiler (as of 2017).

A few years ago, I was part of the effort to connect TBB to IBM Blue Gene systems. Static communication was a daunting task, but now it is accessed using big_iron.inc . Other issues were supported by the relatively old versions of GCC (4.1 and 4.4) and provided the PowerPC atom. I expect that porting to any unsupported architecture will be relatively simple on platforms that provide or are compatible with GCC and POSIX.

Use in community code

I know at least two HPC application platforms that use TBB:

I don't know how MOOSE uses TBB, but MADNESS uses TBB for its task queue and memory allocator.

Performance Compared to Other Streaming Models

I personally used TBB in the Parallel Research Nerners project , in which I compared TBB with OpenMP, OpenCL, Kokkos, RAJA, C ++ 17 Parallel STL and other models. See the C ++ subdirectory for more details.

The following figure shows the relative performance of the above models on the Intel Xeon Phi 7250 processor (details are not important - all models used the same settings). As you can see, TBB does a great job, except for smaller issues where the overhead of adaptive planning is more relevant. TBB has tuning knobs that will affect these results.

PRK stencil

Full disclosure: I work for Intel in research / path finding capabilities.

+2
Aug 09 '17 at 20:44 on
source share

Have you looked at the boost library with its thread API ?

+1
Sep 20 '08 at 6:08
source share

Threading Building (TBB) blocks in the open source version (there is a new commercial version, $ 299, I don’t know the difference) GNU General public license of version 2 with the so-called "runtime exception" (specific for use only on the creation of free software.) I saw other Runtime Exceptions that try to approach LGPL, but commercial use and static binding of this is not the case.

According to this question, in- line building blocks can be used without copying restrictions for commercial use.

+1
Jan 04 '09 at 8:40
source share

It is worthwhile to understand which TBB (Threading Building Blocks) are intended to be compared with other alternatives (e.g. C ++ 11x concurrency). TBB is a portable and scalable library (not a compiler extension) that allows you to write your code in the form of light tasks that TBB plans to schedule as quickly as possible on available CPU resources. It is not intended to support threads for other purposes (for example, for prevention).

I used TBB to speed up the processing of existing images for loops along the lines of scanning images into parallel loops (at least 2-4 scan lines in the form of β€œgrain”). It was very successful. This requires that your loop body (re) be written to handle an arbitrary index, and not to assume that each loop object is processed sequentially (for example, pointers that increase between each iteration of the loop).

This was a pretty trivial case, as there was no general repository for updating. Using more powerful functions (such as a pipeline) will require significant redefinition and / or rewriting of existing code, so it may be better suited for new code.

This is a powerful advantage in that this TBB-based code remains portable, apparently does not interfere with another code in another place in the same process to use different thread strategies and can subsequently be combined with multiprocessor strategies at higher or lower level (for example, the TBB parallel_for code could be called from a filter in the multiprocessor TBB pipeline).

+1
Dec 03 '17 at 11:36 on
source share



All Articles