The first Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’16) has issued its Call for Papers. As a one-day event held in conjunction with SC16, the Nov. 14 workshop will combine two overlapping communities to to address some of the most critical challenges for scientific data storage, management, devices, and processing infrastructure. To learn more, we caught up with workshop co-chairs Dean Hildebrand (IBM) and Shane Canon (LBNL).
insideHPC: This is the first time you’ve done PDSW and DISCS together. What prompted you to combine these two workshops?
Dean Hildebrand and Shane Canon: The 10 year PDSW workshop series focused on the data storage and management problems and emerging solutions found in peta- and exascale scientific computing environments. It paid special attention to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy in, and shared tools, given the growing and unprecedented demands of storage capacity, performance, concurrency, reliability, availability, and manageability from peta- and exascale computing infrastructures.
The DISCS workshop series has been held each of the last four years (since 2012) in conjunction with SC and facilitated dialogue about research aimed at the intersection of data intensive computing and traditional high performance computing. DISCS workshops provided a venue for researchers to discuss recent results and the future challenges of running data intensive applications on both traditional HPC systems and the latest data-centric computer systems.
There is a growing trend and need for convergence between these two worlds. The objective of the PDSW-DISCS joint workshop effort is to combine two overlapping communities and to better promote and stimulate researchers’ interactions to address some of most critical challenges for scientific data storage, management, devices, and processing infrastructure for both traditional compute intensive simulations as well as and data-intensive high performance computing solutions. Additionally, this merge intends to facilitate attendees’ schedule better, increase the competitiveness of publications at the workshop, and ensure complete community participation.
insideHPC: How are the worlds of HPC and Data-intensive computing coming together?
Dean Hildebrand and Shane Canon: Many scientific problem domains continue to be extremely data intensive. Traditional high performance computing (HPC) systems and the programming models for using them such as MPI were designed from a compute-centric perspective with an emphasis on achieving high floating point computation rates. But processing, memory, and storage technologies have not kept pace and there is a widening performance gap between computation and the data management infrastructure. Hence data management has become the performance bottleneck for a significant number of applications targeting HPC systems. Concurrently, there are increasing challenges in meeting the growing demand for analyzing experimental and observational data. In many cases, this is leading new communities to look towards HPC platforms. In addition, the broader computing space has seen a revolution in new tools and frameworks to support Big Data analysis and machine learning.
insideHPC: You’ve issued your Call for Papers. What types of papers are you looking for?
Dean Hildebrand and Shane Cannon: This being our first combined workshop, we’re giving special attention to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools.
We are looking for a set of submissions that address a wide-spectrum of storage and data intensive computing issues currently facing the HPC community: scalable storage architectures, metadata and complex data management, software defined storage, productivity tools for data mining and knowledge discovery, etc. The detailed list of topics of interest can be found on the workshop website.
insideHPC: Can you tell me more about the Work In Progress sessions and what kinds of projects might fit that segment of your program?
Dean Hildebrand and Shane Canon: The Work-In-Progress session is designed to be a fun and quick preview of the emerging storage research ideas in HPC. We will have a series of 5 minute presentations that highlights emerging problems or gives a brief introduction to potential novel solutions to existing problems. This work should be well thought out, but not yet mature or complete enough for paper submission. This is a great opportunity for researchers to get early feedback on their ideas. Given that this is the 1st joint workshop, it would be especially interesting to see submissions that address both simulation workloads as well as data intensive applications.
insideHPC: How does this workshop tie in with the Exascale program in the USA?
Dean Hildebrand and Shane Canon: The convergence of HPC and Big data has been specifically recognized by the Exascale program – The U.S. Congressional Office of Management and Budget has informed the U.S. Department of Energy that new machines beyond the first exascale machines must address both the traditional simulation workloads as well as data intensive applications. So, we believe the goals of this workshop are well aligned with the vision of the U.S. Exascale program and the U.S. program is not alone in emphasizing this convergence.
Registration is now open for SC16 (including the workshop). The event takes place Nov. 13-18 in Salt Lake City.