Today the Brookhaven National Laboratory announced that it has expanded its Computational Science Initiative (CSI). The programs within this initiative leverage computational science, computer science, and mathematics expertise and investments across multiple research areas at the Laboratory-including the flagship facilities that attract thousands of scientific users each year-further establishing Brookhaven as a leader in tackling the “big data” challenges at experimental facilities and expanding the frontiers of scientific discovery.
The Computational Science Initiative (CSI) brings together under one umbrella and extends the expertise that has driven this success,” said CSI Director Kleese van Dam. “Our mission is to foster cross-disciplinary collaborations to address the next generation of scientific data challenges posed by facilities such as NSLS’s successor, the new National Synchrotron Light Source II (NSLS-II).”
Key partners in this endeavor include nearby universities such as Columbia, Cornell, New York University, Stony Brook, and Yale, and companies including IBM Research. In addition to support from the DOE Office of Science and Brookhaven Lab internal investments, the initiative will receive substantial funding from New York State over the next five years. This combined funding will enable the Initiative to pursue its aggressive growth strategy, both in terms of staffing and in extending its operational and research computing infrastructure. The initiative is led by Kerstin Kleese van Dam (Director), Michael Ernst (Deputy Director), and Robert Harrison (Chief Scientist).
Advances in computational science, data management, and analysis have been a key factor in the success of Brookhaven Lab’s scientific programs at the Relativistic Heavy Ion Collider (RHIC), the National Synchrotron Light Source (NSLS), the Center for Functional Nanomaterials (CFN)-all DOE Office of Science User Facilities-and in biological, atmospheric, and energy systems science. Computation also plays a major role in the Lab’s collaborative participation in international research endeavors, such as the ATLAS experiment at Europe’s Large Hadron Collider.
A particular focus of CSI’s work will be the research, development and deployment of novel methods and algorithms for the timely analysis and interpretation of high volume, high velocity, heterogeneous scientific data created by experimental, observational, and computational facilities to accelerate and advance scientific discovery. “CSI is taking an integrated approach, engaging in leading-edge research, building the research and operational computing facility infrastructure required, and creating multi-disciplinary teams that deliver operational data analysis capabilities to the scientific user communities,” said Kleese van Dam.
Core to the initiative is the new Computer Science and Mathematics effort led by Barbara Chapman, a recent joint appointee at Brookhaven Lab and Stony Brook University. Her team will focus on fundamental research into novel methods and algorithms in support of hypothesis-driven streaming data analysis in high-data-volume and high-data-velocity experimental and computing environments. Further efforts will research new solutions for multi-source streaming data analysis and interpretation, as well as long-term data curation and active reuse.
Reliability, high performance, and energy efficiency are key drivers for CSI’s user communities, so the team’s research will address all relevant aspects of streaming data processing from hardware architectures to the application layers,” Chapman said.
CSI’s Computational Science Laboratory (CSL) is a new collaborative institute for novel algorithm development and optimization. Bringing together expertise in high-performance computing (HPC), math, and domain science, it will specifically address the challenge of developing novel algorithms to deliver on the promise of exascale science (the ability to compute at a rate of 1018 floating point operations per second, or exaFLOPS). CSL will support the development of advanced simulation codes in classic domains such as materials science, chemistry, quantum chromodynamics, fusion, and large eddy simulations. In addition, CSL will provide training and advice to Brookhaven Lab science programs and facilities, enabling them to utilize emerging computing technologies to their full extent. CSL is led by Nicholas D’ Imperio.
A centerpiece of the initiative will be the new Center for Data-Driven Discovery (C3D), which will serve as external focal point for CSI’s data-centric computing activities. Within the Laboratory, it will drive the integration of domain, computational, and data science expertise across Brookhaven’s science programs and facilities, with the goal of accelerating and expanding scientific discovery by developing, deploying, and operating novel data-management, analysis, and interpretation tools and capabilities. A key focus area will be developing and deploying streaming data analysis services for experimental facilities. Outside the Laboratory, C3D will serve as a focal point for recruiting, collaboration, and communication. Kerstin Kleese van Dam is currently acting as its interim leader until a permanent lead for C3D is identified.
The people and capabilities of C3D are integral to the success of Brookhaven’s key DOE Office of Science User Facilities, including NSLS-II, RHIC, CFN, and a possible future electron ion collider. Hundreds of scientists from Brookhaven and thousands of facility users from universities, industry, and other laboratories across the country and around the world will benefit from the capabilities developed by C3D personnel to better understand the enormous volumes of data produced at these state-of-the-art research facilities.
Underpinning the work of the CSI is the creation of a new, integrated scientific data, computing, and networking infrastructure across the Brookhaven Lab site-this new Scientific Data and Computing Center will be led by Michael Ernst. Brookhaven Lab has a strong history of advances in the successful operation of large-scale computational science, data management, and analysis infrastructure, and the management of large-scale scientific data. One example of Brookhaven’s computing expertise is the RHIC & ATLAS Computing Facility (RACF). Formed in 1997 to support experiments at RHIC, Brookhaven’s flagship particle collider for nuclear physics research, the RACF is now at the center of a global computing network connecting more than 2,500 researchers around the world with data produced by RHIC and the ATLAS experiment at the Large Hadron Collider.
This world-class center houses an ever-expanding farm of computing cores (50,000 as of 2015), receiving data from the thousands of particle collisions that take place each second at RHIC, along with petabytes of data generated by the LHC’s ATLAS experiment-storing, processing, and distributing that data to and running analysis jobs for collaborators around the nation and the world.
The success of this distributed approach to data-intensive computing, combined with new approaches for handling data-rich simulations, has helped establish the U.S. as a leader in high-capacity computing, thereby enhancing international competitiveness. RACF will serve as a model for computing and data investigations under the new initiative, and as such will form the core of the new Brookhaven Lab Scientific Data and Computing Center. The new center will also house the Lab’s new institutional computing system, new NY State-funded operational data-intensive computing systems, a series of novel architecture research systems, as well as computing and data services operated for other third-party clients.
The CSI-in conjunction with CSL and C3D-will also host a series of workshops/conferences and training sessions in high-performance and data-centric computing-including the New York Scientific Data Summit (NYSDS). These events will explore topics at the frontier of data-centric, high-performance computing, such as the combination of efficient methodologies and innovative computer systems and concepts to manage and analyze scientific data generated at high volumes and rates.