Most HPC applications and environments are interested in either the compute speed (usually floating point operations) and memory bandwidth. However, in certain application domains, such as the financial services industry, low latency is of critical importance.
With high frequency trading becoming so important, the overall system performance, starting with the acquisition of the data from various markets through to the buy or sell decision relies on low latency between various parts of the system. The feed handlers, which accept the data in various formats, can be multithreaded and take advantage of coprocessors such as the Intel Xeon Phi. The NIC on a system waits for the packets to arrive and can then the information to a specified core on the Intel Xeon Phi coprocessor system for processing.
In order to gain maximum performance and low latency from these financial applications, a number of optimizations can be used. Using the right API for the job is of critical importance, as bandwidth sensitive applications benefit from DMA data implementations. Also, the coprocessor memory is mapped, so that and application can combine larger amounts of data before transporting over the PCIe bus. Limiting reads across the PCIe bus also increases performance. Thread affinity is also very important, to avoid kernel scheduling interrupts. Finally, the use of compiler generated memcpy can results in higher performance.
Financial services is an example of a domain where HPC technologies can be used, but where pure floating point simulation is not the objective. The movement of data using the most appropriate hardware available can increase the performance tremendously, through reduced latency.
Source: Intel, USA
New Tools, New Rules
Create faster code—faster—with the new Intel® Parallel Studio XE.