The Smith-Waterman algorithm is widely used for pairwise DNA sequence alignment. The computation, consisting of looking for pattern in very long strings of the DNA alphabet, is very demanding. Using the Intel Xeon Phi, tremendous performance gains can be obtained, as long as the algorithms have been modified to take advantage of parallelism.
For two DNA sequences an initial score relating to the matching of the four DNA alphabet (A,C,G,T) is desired. A data dependency matrix can be created, which can show the dependencies when searching long strings for comparison. The matrix size will be n x m, where n and m are the lengths of the strings to be compared. This large matrix can then be tiled, based on the number of available threads. Using OpenMP, the comparisons can be spread across available processors.
When utilizing a cluster of processors, MPI can be used to share the work across the coprocessors. The communication between the host and the coprocessors needs to be managed by the offload directives. The resulting performance results can be exceptional, compared to a host implementation. The base number of pairs ranges in the millions and can be obtained from publicly available databases. The length of the base pairs for initial testing ranged from 4.4 million to over 50 million in length.
The resulting application showed a speedup efficiency of over 92 % using 4 of the Intel Xeon Phi coprocessors. The performance of this application was measured in billion of cell updates per second. In most test cases as mentioned above, the speedup was almost linear when scaling from 1 to 4 Intel Xeon Phi coprocessors.
Three levels of parallelism were used to obtain these results. These were OpenMP, SIMD intrinsics and MPI.
Source: Georgia Institute of Technology, USA , Johannes Gutenberg University, Germany
Turn Big Research into Big Results
Accelerate data analysis with Intel® Parallel Studio XE.