SpaRC: Scalable Sequence Clustering using Apache Spark

Zhong Wang from the Genome Institute at LBNL gave this talk at the Stanford HPC Conference. “Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. Here we describe an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC) that partitions reads based on their molecule of origin to enable downstream assembly optimization.”

Interview: John Gustafson on the Evolution of Supercomputing

In this MacObserver podcast, John Martellaro discusses the evolution of HPC with John Gustafson, Visiting Professor at NUS in Singapore. “Known by many as the father of Gustafson’s Law, Dr. John Gustafson is a professor of computer science, now at The National University of Singapore. Listen in as John describes his career arc and offers some great advice for young scientists just getting started.”

Video: Tianhe-1A Supercomputer at Work

In this video, researchers describe how the Tianhe-1 supercomputer supports scientific research. “Currently #43 on the TOP500, the 2.56 Petaflop Tianhe-1A carries out 1,400 computing tasks per day. It is mainly used to serve universities, research institutions, small and medium-sized enterprises, and provide scientific computing services.”

Dispelling the Myth “OpenMP Does Not Scale”

Ruud van der Pas from Oracle presented this talk at OpenMPcon. “Unfortunately it is a very widespread myth that OpenMP Does Not Scale – a myth we intend to dispel in this talk. Every parallel system has its strengths and weaknesses. This is true for clustered systems, but also for shared memory parallel computers. While nobody in their right mind would consider sending one zillion single byte messages to a single node in a cluster, people do the equivalent in OpenMP and then blame the programming model. Also, shared memory parallel systems have some specific features that one needs to be aware of. Few do though. In this talk we use real-life case studies based on actual applications to show why an application did not scale and what was done to change this. More often than not, a relatively simple modification, or even a system level setting, makes all the difference.”

Supercomputing Technologies for Big Data Challenges

In this special guest feature, Ferhat Hatay from Fujitsu writes that supercomputing technologies developed for data-intensive scientific computing can be a powerful tool for taking on the challenges of Big Data. We all feel it, data use and growth is explosive. Individuals and businesses are consuming — and generating — more data every day. The […]