Over at the Journal of Supercomputing Frontiers and Innovations, Franck Cappello, Al Geist, William Gropp, Sanjay Kale, Bill Kramer, and Marc Snir have published a new peer-reviewed paper entitled: Toward Exascale Resilience – 2014 Update.
The past five years have seen extraordinary technical progress in many domains related to exascale resilience. Several technical options, initially considered inapplicable or unrealistic in the HPC context, have demonstrated surprising successes. Despite this progress, the exascale resilience problem is not solved, and the community is still facing the difficult challenge of ensuring that ex- ascale applications complete and generate correct results while running on unstable systems. Since 2009, many workshops, studies, and reports have improved the definition of the resilience problem and provided refined recommendations. Some projections made during the previous decades and some priorities established from these projections need to be revised. This paper surveys what the community has learned in the past five years and summarizes the research problems still considered critical by the HPC community.”
Published by Jack Dongarra from the University of Tennessee and Vladimir Voevodin from Moscow State University, The Journal of Supercomputing Frontiers and Innovations (JSFI) is a new peer reviewed publication that addresses the urgent need for greater dissemination of research and development findings and results at the leading edge of high performance computing systems, highly parallel methods, and extreme scaled applications. This open access on-line international journal will facilitate rapid distribution of high quality papers, letters, and reviews representing recent advances and views to drive further progress in the important and rapidly progressing field of supercomputing.
Download the paper (PDF) * Download the premier issue of the Journal of Supercomputing Frontiers and Innovations