For applications to work best with the new type of processors and coprocessors, many tasks must be created. Using various tools, lots of tasks can be created during the execution time of an application. There are a number of methods to scale codes on both the Intel Xeon CPUs and the Intel Xeon Phi Processor. When an application can scale through the use of tasks on the limited number of cores on the a main CPU, it is possible to use the same model to scale on the hundreds of cores that exist on the Intel Xeon Phi coprocessor.
Tasks keep the CPUs busy. When a core is working, rather than waiting for work to be sent to it, the application progresses towards it conclusion. A caveat to all of this is to remember that tasking and threading models remain on the system it was created on. Tasks that use a shared memory space only work within the shared memory segment that the processing cores can get to. Shared memory on the CPU side of the system is separate from the shared memory on the coprocessor. The threads created will remain on the part of the system where it started.
Loops are an excellent place to start to look for creating tasks to do some work. Although a thread could be created for every iteration of an inner loop, this would incur a lot of overhead compared to the work being performed. Creating too many threads would lead to tremendous overhead that might overwhelm the processor or coprocessor that needs to perform the work.
A number of programming environments have support for creating tasks when the developer wants to. OpenMP has the PARALLEL DO directive, and Fortran contains the DO CONCURRENT directive. Since loops are usually well understood, creating tasks at the highest levels of the loops is beneficial. Creating tasks away from loop iterations is more difficult, but can be done if the overall application and algorithms are well understood.
Source: Intel, USA