The convergence of HPC and BigData: What does it mean for HPC sysadmins?

Print Friendly, PDF & Email

Damien Francois

In this video from FOSDEM’19, Damien Francois from the Université catholique de Louvain presents: The convergence of HPC and BigData: What does it mean for HPC sysadmins?

There are mainly two types of people in the scientific computing world: those who produce data and those who consume it. Those who have models and generate data from those models, a process known as ‘simulation’, and those who have data and infer models from the data (‘analytics’). The former often originate from disciplines such as Engineering, Physics, or Climatology, while the latter are most often active in Remote sensing, Bioinformatics, Sociology, or Management.

Simulations often require large amount of computations so they are often run on generic High-Performance Computing (HPC) infrastructures built on a cluster of powerful high-end machines linked together with high-bandwidth low-latency networks. The cluster is often augmented with hardware accelerators (co-processors such as GPUs or FPGAs) and a large and fast parallel filesystem, all setup and tuned by systems administrators. By contrast, in analytics, the focus is on the storage and access of the data so analytics is often performed on a BigData infrastructure suited for the problem at hand. Those infrastructure offer specific data stores and are often installed in a more or less self-service way on a public or private ‘Cloud’ typically built on top of ‘commodity’ hardware.

Those two worlds, the world of HPC and the world of BigData are slowly, but surely, converging. The HPC world realizes that there are more to data storage than just files and that ‘self-service’ ideas are tempting. In the meantime, the BigData world realizes that co-processors and fast networks can really speedup analytics. And indeed, all major public Cloud services now have an HPC offering. And many academic HPC centres start to offer Cloud infrastructures and BigData-related tools.

This talk will focus on the latter point of view and review the tools originating from the BigData and the ideas from the Cloud that can be implemented in a HPC context to enlarge the offer for scientific computing in universities and research centres.

Damien François is an HPC sysadmin with a background in Machine Learning. Passionatized by numbers, science and computers. Curiousified by entrepreneurship and the fine art of communication.

Check out our insideHPC Events Calendar