With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. This why threading building blocks can be extremely helpful in application development.
It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available.
Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application. Rather than developing the program to have knowledge of the underlying hardware (that can change) and explicitly assigning work to different threads and cores, Intel TBB can handle much of the work for the developer.
Some of the features of the Intel TBB are that it enables the developer to specify logical parallelism instead of threads. This allows the runtime library to map the parallelism onto the thread, depending on the hardware that the application is being run on.
Intel TBB is compatible with other threading packages can coexist with other libraries. Intel TBB emphasizes data-parallel programming, enabling multiple threads to work on different parts of a collection. Data-parallel programming scales well to larger numbers of processors by dividing the collection into smaller pieces. With data-parallel programming, program performance increases as you add processors.
The goal of Intel TBB is to achieve performance through parallelization, with knowledge of the underlying hardware, from both Intel as well as other chip vendors. The Intel TBB can be used in a wide range of application domains, including weather prediction, finite element analysis, medical applications, seismic exploration, and weapon research and defense .
Intel TBB is optimized for Intel products that are aimed squarely at the high performance computing market. This includes the Intel Xeon and the Intel Xeon Phi processor. Since these processors vary in the number of cores available as well as the memory hierarchy, using a library to assist in taking advantage of the run time environment is critical.