Saving East African Crops with Supercomputing

Print Friendly, PDF & Email

In this special guest feature from Scientific Computing World, Dr Laura Boykin describes how the largest supercomputer in the southern hemisphere can help fight famine among East Africa’s farmers.

crop-drought-in-tanzania_lgDespite its tiny size, measuring barely one millimeter, the silverleaf whitefly has had catastrophic effects for East Africa. Infestations of these tiny sap-sucking insects have destroyed entire seasons of crops — and livelihoods.

The whitefly, or Bemisia tabaci, is a vector, a carrier, of two viruses that attack cassava plants. Also known as tapioca, cassava originated in Brazil but is now a staple crop for around 700 million people worldwide and a crucial food source East Africa in particular. Farmers rely on cassava because it is a slow-growing crop that can bridge the nutritional gap between growing seasons after other crops such as bean and sweet potatoes have been consumed. The whitefly transmits two viruses: cassava mosaic disease and cassava brown streak disease. Together, these viruses can completely wipe out a year’s product. For a family, a whitefly infestation suddenly means no food; for the region, it can mean widespread economic hardship and famine.

Bemisia tabaci is a pest that costs agriculture billions of dollars a year worldwide. A team of researchers has come together at the University of Western Australia to combat the pest and, because it is particularly devastating for East Africa, we are focusing our research efforts there. It’s a massive problem. I’m one of 15 principal investigators working on a new project whose mission is to give farmers a cassava plant that’s resistant to the viruses and the whiteflies.

The work is funded by the Bill and Melinda Gates Foundation and, to aid our research, the team has been awarded time on the fastest supercomputer in the southern hemisphere, the Cray XC40 Magnus supercomputer at the Pawsey Supercomputing Centre, in Perth, Western Australia. To support the work, I was also awarded a 2015 TED Fellowship. (In The future of HPC in Australia published in Scientific Computing World’s special supplement HPC 2014-15, Lindsay Botten and Neil Stringfellow describe Australia’s supercomputing facilities and its national HPC strategy.)

For decades, scientists have assumed there was only one species of silverleaf whitefly. In reality, there are at least 34 species, but they are morphologically indistinguishable. It’s only been in the last seven years or so that people have started to do sampling of the region. The more we sample, the more we realize there are tons more species of whitefly in Africa than we ever thought.

Because the species are identical to look at, the best way to distinguish them is by examining their genetic difference, so we are deploying a mix of genomics, supercomputing, and evolutionary history. This knowledge will help African farmers and scientists distinguish between the harmless and the invasive ones, develop management strategies, and breed new whitefly-resistant strains of cassava.

The computational challenge for our team is in processing the genomic data the sequencing machines produce. We have the task of trying to make sense out of billions of base pairs — billions of A’s, T’s, G’s and C’s at a time.

The Magnus supercomputer allows us to generate phylogenetic trees of whitefly species from around the world. These phylogenetic trees represent evolutionary relationships among the species. For this project, the genetic datasets involved thousands of base pairs. Even with only 500 whiteflies in a dataset, the possible relationships between these flies run into the octillions (a 1 followed by 27 zeros) — a calculation impossible without a supercomputer.

The team is running MrBayes — a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. It uses Markov chain Monte Carlo methods, a class of algorithms for sampling from a probability distribution, to sample the massive evolutionary tree space.

Given the large size of the genetic datasets and sophisticated computing techniques involved, the project is computationally highly intensive. Magnus has allowed us to conduct these complex tasks in a practical amount of time thanks to its multiple processor technologies, a high-performance network, distributed operating system and productive programming environment.

So far, we have analyzed an entire genetic region for all the global samples. We have done benchmarking against our other systems and Magnus outperforms them. Between 16 and 80 Markov chains traversed the tree space on Magnus in just under 96 hours. The next-best performing system handled about half as many chains in 110 hours. We are making meaningful progress toward distinguishing the damaging whiteflies from others and thus providing scientists with the information they need to develop management strategies. Magnus is helping us make a difference to agricultural development.

Laura Boykin is assistant professor at the University of Western Australia and leads the whitefly research team there.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter