Interview: European cHiPSet Event focuses on High-Performance Modeling and Simulation for Big Data Applications

While Modeling and Simulation offer suitable abstractions to manage the complexity of analyzing Big Data in various scientific and engineering domains, Big Data problems in other disciplines are not always easily amenable to these methods. Enter the European cHIPSEt Project, a collaboration designed to bridge the gap.

The cHIPSet Annual Plenary Meeting takes place in France next month. To learn more, we caught up with the Vice-Chair for the project, Dr. Horacio González-Vélez, Associate Professor and Head of the Cloud Competency Centre at the National College of Ireland.

insideHPC: What is the cHIPSet project and who is it designed to help?

Horacio González-Vélez: cHiPSet is a COST Action funded by the European Commission Horizon 2020 programme. COST Actions are bottom-up science and technology networks, open to academic and industry researchers and practitioners. Specifically, cHiPSet is establishing collaborative connections around High-Performance Modeling and Simulation for Big Data Applications.

Modeling and Simulation (MS) are widely considered as essential tools in many areas of science and engineering. Modeling has traditionally addressed complexity by raising the level of abstraction to enable the properties of a system be studied by simulating its behavior. HPC is arguably required to deal with the behavior and complexity of such abstractions of large-scale, Big Data systems. cHiPSet covers scientific and technological research activities to bridge the gap between High-Performance Computing and data-intensive MS. On the one hand, domain experts need HPC for simulation, modeling and data analysis but are often unaware of performance and parallelism exploitation pitfalls in their designs. On the other hand, designers of HPC development tools and systems primarily focus on absolute performance measures, by definition the raison d’etre for HPC.

insideHPC: cHiPSet is said to be one of the largest Big Data/HPC projects in Europe. Can you give us an idea as to the scope of this thing?

Dr. Horacio González-Vélez is Vice-Chair of cHIPSet and Associate Professor and Head of the Cloud Competency Centre at the National College of Ireland.

Horacio González-Vélez: cHiPSet is linking engineers, graduate students, academics, practitioners, and consultants in 37 countries in Europe, Asia, The Americas, and Australia. cHiPSet has four thematic areas:

  1. Modeling of Big Data enabling infrastructures and middleware
  2. Parallel programming models for Big Data problems
  3. HPC-enabled modeling for Life Sciences
  4. HPC-enabled modeling for Socio-economical and Physical models

HPC architects and grad students are developing programming models and architectures tailored to specific data-intensive MS problems. We are currently working on a comprehensive set of case studies that will cover diverse areas such drug discovery, human cell simulation, social media, healthcare, electromagnetics, blockchain applications, smart tourism, and telecommunications.

insideHPC: You have your annual plenary meeting coming up in March. As Vice-Chair for the cHiPSet project? What are your objectives for this event?

Horacio González-Vélez: That is correct: we are holding our annual plenary meeting in Fontainebleau, a city close to Paris, on March 19th and 20th, where we will continue fostering cross-fertilisation between HPC and MS researchers from Europe and beyond. More details from the event are available from our cHiPSet website at:

http://chipset-cost.eu/index.php/agenda-france/

The plenary meeting will feature a workshop entitled “Accelerating Modeling and Simulation in the Data Deluge Era“. We are expecting keynote presentations and panel discussions on how the forthcoming exascale systems will influence the analysis and interpretation of data, including the simulation of models, to match observation to theory.

The design and optimization of Big Data HPC-enabled experiments and large scale HPC systems require the realistic description and modelling of data-access patterns, the data flow across the local and wide area networks, and the scheduling and workload presented by hundreds of jobs running concurrently and exchanging very large amounts of data. Exascale systems will arguably increase the challenge in orders of magnitude.

For example, several MS approaches to exascale characterisation are based on discrete-event frameworks. MS have helped to address problems such as scheduling in distributed heterogeneous environments, economy-driven resource allocation, Big Data access in distributed environments and more generic HPC concurrent distributed and cloud architecture. Going forward stochastic data traffic MS or hardware, middleware and application co-design should help to reduce the exascale complexity, but we need more precise measures of uncertainty and associated errors e.g. via statistical inference. Simulations have been run in this context on one million core machines, but recent trends aim to empower programmers to estimate the performance implications on millions of cores.

insideHPC: How do you go about bridging the gap between the MS Modeling and Simulation folks and traditional HPC experts?

Horacio González-Vélez: In the current HPC-supported MS scenario, single success stories and still-to-be, next step potential developments co-exist. The capability to turn Big Data into usable knowledge has not yet developed into a fully systematic technological framework supported by a comprehensive theory. A lot of the software used in the Big Data context is still eminently sequential, or a naïve parallelization of it. Even strong results are often based on hacked solutions consisting of collections of scripts managing a 300-core cluster. A more integrated approach was attempted by the external help of HPC experts, but after several architectural difficulties modelers had to revert to an improved version of the script-based approach. HPC architects, who might well contribute to delivering efficient MS solutions, do not always have either the incentive or the resources to develop full-scale application software. Clearly, making MS experts become HPC skilled architects is not a viable approach, but a stronger interaction amongst the two fields and the coordinated development of theories and tools supporting the work of MS experts on efficient HPC frameworks is fundamental.

Recognizing the strong European tradition in commercial modelling and simulation applications, the EU commission itself has clearly pointed out the need and priority to adapt “existing modeling and simulation techniques, and to develop new ones, so that they scale to massive degrees of parallelism”, stressing the industrial relevance of a closer collaboration between MS and HPC. They recognize the need to make it easier and less risky for companies to invest in long term R&D into new modelling and simulation techniques, expanding their user base into new application areas. Moreover, this appears to be a global challenge: “In this time of crisis, the U.S. has the technological tools to maintain our competitive edge and global leadership in manufacturing, but we risk our manufacturing leadership position if we fail to utilize the game-changing tool of high performance computing (HPC) for modeling, simulation, and analysis”, suggesting that Europe has to play a strategic role in it: the race for leadership in HPC systems is already driven by the need to address scientific grand challenges more effectively and societal grand challenges are expected to become driving forces too. It is widely accepted that high-end computing has an important role in making Europe more competitive. Particularly for SMEs, access to HPC, modeling, simulation, and product prototyping services is one of the most important aspects towards true competitiveness. The Action has constituted a forum for the creation of future stable synergies amongst MS and HPC experts, from both academia and industry in this strategic sector, allowing them to exchange ideas on HPC enabled MS for Big Data problems and to engage in projects aimed at devising unified approaches, solutions, methodologies and tools.

insideHPC: Besides attending the event, how can our readers engage with cHiPSet?

MINES ParisTech at Fontainebleau

Horacio González-Vélez: Indeed there is life beyond the Fontainebleau event. The cHiPSet membership is open (and free) to new members. We are always in the lookout for new interesting people to join. The value of this network resides substantially in the number of members. We have a specific industry collaboration team led by Dave Feenan at IBEC, the group that represents Irish business both domestically and internationally.

Any individual whether in industry or in academia can reach out to Dave and his team to join the cHiPSet. Once included in the Action, we can sponsor her/him to visit different organisations with common interests. We have had in excess of 50 individual research residences. Additionally, we also sponsor attendants to major research gathering such as the forthcoming event in Fontainebleau, Summer Schools, and collaborative visits.

Registration is now open for the cHIPSet Annual Meeting, which takes place March 19-20 in Fontaineble, France.

Check out our insideHPC Events Calendar