Virtual HPC Clusters Power Cancer Research at eMedLab

Print Friendly, PDF & Email

openstackA partnership of seven leading bioinformatics research and academic institutions called eMedLab is using a new private cloud, HPC environment and big data system to support the efforts of hundreds of researchers studying cancers, cardio-vascular and rare diseases. Their research focuses on understanding the causes of these diseases and how a person’s genetics may influence their predisposition to the disease and potential treatment responses.

The new HPC cloud environment combines a Red Hat Enterprise Linux OpenStack Platform with Lenovo Flex System hardware to enable the creation of virtual HPC clusters bespoke to individual researchers’ requirements. The system has been designed, integrated and configured by OCF, an HPC, big data and predictive analytics provider, working closely with its partners Red Hat, Lenovo, Mellanox Technologies and in collaboration with eMedlab’s research technologists.

The High Performance Computing environment is being hosted at a shared data centre for education and research, offered by digital technologies charity Jisc. The data centre has the capacity, technological capability and flexibility to future-proof and support all of eMedLab’s HPC needs, with its ability to accommodate multiple and varied research projects concurrently in a highly collaborative environment. The ground-breaking facility is focused on the needs of the biomedical community and will revolutionize the way data sets are shared between leading scientific institutions internationally.

The eMedLab partnership was formed in 2014 with funding from the Medical Research Council. Original members University College London, Queen Mary University of London, London School of Hygiene & Tropical Medicine, the Francis Crick Institute, the Wellcome Trust Sanger Institute and the EMBL European Bioinformatics Institute have been joined recently by King’s College London.

“Bioinformatics is a very, very data intensive discipline,” says Jacky Pallas, Director of Research Platforms, University College London. “We want to study a lot of de-identified, anonymous human data. It’s not practical – from data transfer and data storage perspectives – to have scientists replicating the same datasets across their own, separate physical HPC resources, so we’re creating a single store for up to 6 Petabytes of data and a shared HPC environment within which researchers can build their own virtual clusters to support their work.”

The Red Hat Enterprise Linux OpenStack Platform, a highly scalable Infrastructure-as-a-Service [IaaS] solution, enables scientists to create and use virtual clusters bespoke to their needs, allowing them to select compute memory, processors, networking, storage and archiving policies, all orchestrated by a simple web-based user-Interface. Researchers will be able access up to 6,000 cores of processing power.

We generate such large quantities of data that it can take weeks to transfer data from one site to another,” says Tim Cutts, Head of Scientific Computing, the Wellcome Trust Sanger Institute. “Data in eMedLab will stay in one secure place and researchers will be able to dynamically create their own virtual HPC cluster to run their software and algorithms to interrogate the data, choosing the number of cores, operating system and other attributes to create the ideal cluster for their research. The Red Hat Enterprise Linux OpenStack Platform enables our researchers to do this rapidly and using open standards which can be shared with the community.” 

Arif Ali, Technical Director of OCF says: “The private cloud HPC environment offers a flexible solution through which virtual clusters can be deployed for specific workloads. The multi-tenancy features of the Red Hat platform enable different institutions and research groups to securely co-exist on the same hardware, and share data when appropriate.”

This is a tremendous and important win for Red Hat,” says Radhesh Balakrishnan, general manager, OpenStack, Red Hat. “eMedLab’s deployment of Red Hat Enterprise Linux OpenStack Platform into its HPC environment for this data intensive project further highlights our leadership in this space and ability to deliver a fully supported, stable, and reliable production-ready OpenStack solution.

Red Hat technology allows consortia such as eMedLab to use cutting edge self-service compute, storage, networking, and other new services as these are adopted as core OpenStack technologies, while still offering the world class service and support that Red Hat is renowned for. The use of Red Hat Enterprise Linux OpenStack Platform provides cutting edge technologies along with enterprise-grade support and services; leaving researchers to focus on the research and other medical challenges.”

Mellanox end-to-end Ethernet solutions enable cloud infrastructures to optimize their performance and to accelerate big data analytics,” said Kevin Deierling, vice president of marketing at Mellanox Technologies. “Intelligent interconnect with offloading technologies, such as RDMA and cloud accelerations, is key for building the most efficient private and cloud environments. The collaboration between the organizations as part of this project demonstrates the power of the eco-systems to drive research and discovery forward.”

The new high-performance environment and big data environment consists of:

  • Red Hat Enterprise Linux OpenStack Platform
  • Red Hat Satellite
  • Lenovo System x Flex system with 252 hypervisor nodes and Mellanox 10Gb network with a 40Gb/56Gb core
  • Five tiers of storage, managed by IBM Spectrum Scale (formerly GPFS), for cost effective data storage – scratch, Frequently Accessed Research Data, virtual clusters image storage, medium-term storage and previous versions backup. 

Download the insideBIGDATA Guide to Scientific Research