Fujitsu Powers Statistical Genetics at Wellcome Trust Centre

Print Friendly, PDF & Email

fujitsuToday Fujitsu announced that the Wellcome Trust Centre for Human Genetics [WTCHG] at the University of Oxford is using the company’s HPC systems to support the genetics research of 25 groups and more than 100 researchers. Integrated by OCF in the UK, the Fujitsu high-performance BX900 blade-based cluster utilizes a Mellanox InfiniBand and DataDirect Networks storage.

Advances in detector design and processing algorithms over the past two years have revolutionized electron microscopy, making it the method of choice for studying the structure of complex biological processes such as infection. However, we thought we could not get sufficient access to the necessary compute to exploit these advances fully. The new genetics cluster provided such a fast and cost-effective solution to our problems that we invested in expanding it immediately,” Professor David Stuart, Oxford University.

The WTCHG Centre houses the second largest next-generation sequencing facility in England, currently producing more than 500 genomes per year. Each processed and compressed genome is about 30GB on disk and across the Centre roughly 15,000-20,000 human genomes occupy about 0.5PB. Numerous and wide-ranging research projects use this data to study the genetic basis of human diseases based on sophisticated statistical genetics analyses. Projects include national and international studies on various cancers, type-2 diabetes, obesity, malaria and analyses of bacterial genomes to trace the spread of infection. The Centre is one of the most highly ranked research institutes in the world and funds for the cluster were provided by a grant from the Wellcome Trust.

By understanding the characteristics of key genetics software applications and optimizing how they map onto the new cluster’s architecture, the Centre has been able to improve dramatically the efficiency of these analyses. For example, analyses of data sets that took months using the Broad Institute’s Genome Analysis Tool Kit (GATK) can now be completed in weeks while using fewer cores.

The new cluster has also proved itself to be perfectly suited to supporting research by the Centre’s Division of Structural Biology (STRUBI) and it has already produced some of the world’s highest-resolution electron microscopy reconstructions – revealing structural details vital to understanding processes such as infection and immunity. The improvement in the performance of electron microscopy codes, particularly Relion, is also very impressive: movie-mode processing requiring more than 2 weeks on eight 16-core nodes of a typical cluster is now completed in 24 hours on just six of the new FDR-enabled, high-memory nodes.

The new cluster’s use of Intel Ivy Bridge CPUs provides a 2.6x performance increase over its predecessor built in 2011. It boasts 1,728 cores of processing power, up from 912, with 16GB 1866MHz memory per core compared to a maximum of 8GB per core on the older cluster.

The new cluster is working alongside a second production cluster; both clusters share a common Mellanox FDR InfiniBand network that links the compute nodes to a DDN GRIDScaler SFA12K storage system whose controllers can read block data at 20GB/s. This speed is essential for keeping the cluster at maximum utilization and consistently fed with genomic data.

The high-performance cluster and big data storage systems were designed by the WTCHG in partnership with OCF, a leading HPC, data management, big data storage and analytics provider. As the integrator, OCF also provided the WTCHG team with training on the new system.

Processing data from sequencing machines isn’t that demanding in terms of processing power any more,” said Dr Robert Esnouf, Head of the Research Computing Core at the WTCHG. “What really stresses systems are ‘all-against-all’ analyses of hundreds of genomes, that is lining up multiple genomes against each other and using sophisticated statistics to compare them and spot differences which might explain the genetic origin of diseases or susceptibility to diseases. That is a large compute and data I/O problem and most of our users want to complete this type of research.”

Sign up for our insideHPC Newsletter.