While the onward march of hardware performance continues, there is a great need for software that can help the developer take advantage of high performance computing systems. Well known are compilers and debuggers, but as the complexity of using multiple systems with various accelerators and coprocessors requires more advanced tools than were used just a decade ago.
While compilers and directives can help to use many processors in parallel, a well-designed algorithm that is many core aware from the start will usually result in the highest performance. Compilers can recognize certain areas of the code to run in parallel, but the more help from humans the better. Whether the application is written in C or FORTRAN, modern compilers can investigate loops to determine if dependencies exist, and alert the developer that re-programming the loop will lead to better results.
Additional tools may be required that can find errors in coding especially when it comes to debugging memory issues or how threads are created and destroyed. By removing memory errors and race conditions early in the design and test phase can save significant amount of effort later, by identifying memory leaks, memory corruption and illegal memory access. Threads can be made safe by removing races that can result in deadlock areas of the code. Many of these errors will not be apparent until the QA phase, which end up costing significant amount of time and effort. Finding these errors and issues early on can lead to earlier product releases.
Vectorization and threading are critical to using such innovative hardware product such as the Intel Xeon Phi processor. Using tools early in the design and development processor that identify where vectorization can be used or improved will lead to increased performance of the overall application. Modern tools can be used to determine what might be blocking compiler vectorization and the potential gain from the work involved. The result is better designed and maintained code. Data dependencies and memory access patterns can be extremely valuable to developers who seek to gain maximum performance when using systems that contain a combination of the Intel Xeon processors and the Intel Xeon Phi processors.