Metagenomics and the computing challenges of microbes

Quick stat from the latest post at the CCC blog: there are estimated to be ten times as many microbial cells in and on your body at cells that make up you. Ewww.

The post is about the challenges of studying and identifying, let alone understanding, the many many different organisms which share the planet with us

Metagenomics is a relatively new field that seeks to understand the structure and function of the shockingly large number of microorganisms on our planet. New technologies permit us to now sequence samples taken from their environment rather than only those that are cultivated in the lab. For example, Craig Ventner’s Global Ocean Sampling Expedition has collected water throughout the world’s oceans, captured organisms, and sequenced their DNA. In the initial pilot study alone, nearly 150 new bacteria were discovered through this process.

The science and computing challenges are huge. A single gram of soil contains approximately one trillion base pairs of DNA….Sequencing and making sense of these data introduces new computational problems, not merely slight extensions of existing ones.

Whither the computational problem?

Complete DNA sequences of thousands of organisms are piling up in databases because of the efficiency of DNA sequencing technologies. Most of this remains unanalyzed for several reasons. We don’t yet know the right biological questions to ask. We don’t have all the clever programs that would actually ask these questions of the computer. And there is now so much data that many questions totally overwhelm even existing high performance computers.

More in the post.