My friend Gavin sent me a note by email last week about some recent news out of OCF (the UK-based HPC integrator) and King’s College London. The net net is that King’s College researchers have reduced the time taken to analyse DNA sequencing data by 20-fold, from days to hours. This is the first use at the National Institute for Health Research (NIHR)
“The sequence of the human genome has been known for ten years now so we are using new sequencing technologies to sequence specific regions of the genome in large numbers of people in order to help understand the contributory factors to a variety of common complex disorders and developmental defects,” says Dr. Rebecca Oakey, Reader in Epigenetics, Department of Medical & Molecular Genetics, School of Medicine, King’s College London. “These include skin diseases such as psoriasis, inflammatory bowel disease and the step by step development of vascular disorders, psychiatric disorders, diabetes, infection and immune disease as well as genetic components in cancer development.”
Dr Oakey adds: “To do so we need innovative sequencing technology to generate the data and the processing power to analyse, store and archive the data.”
The system that OCF helped field is an IBM iDataPlex with a 10GbE interconnect.
[UPDATED on 05062010 with details of the system]
The system that King’s College is using has 31 nodes. 30 of those nodes are dual socket quadcore processors, yielding 240 cores total for your run-of-the-mill distributed memory cluster jobs. The cluster also includes one fat node with 4 sockets loaded with six-core processors. The fat compute node operates as a large shared memory environment for those existing codes which either do not scale well on a cluster or require too much work to be economic by KCL to port to a cluster environment.