In this video, Malcolm Muggeridge from Seagate and other SAGE partners describe how the project is redefining data storage for the era of extreme data an Exascale computing.
Exascale is characterized not just by Exaflop computational capability, but also by massive volumes of data generated by both simulations running on such systems and increasingly by data generated through massive scientific experiments, crowdsourcing, and expanding sensor networks continually multiplying the volume of data. Such data must be analyzed to derive valuable insights through which innovations and understanding are made possible in a vast spectrum of domains such as physics, computational biology, neuroscience, pharmaceutics, energy, and industrial manufacturing.
“The SAGE project, which incorporates research and innovation in hardware and enabling software, will significantly improve the performance of data I/O and enable computation and analysis to be performed more locally to data wherever it resides in the architecture, drastically minimizing data movements between compute and data storage infrastructures. With a seamless view of data throughout the platform, incorporating multiple tiers of storage from memory to disk to long-term archive, it will enable API’s and programming models to easily use such a platform to efficiently utilize the most appropriate data analytics techniques suited to the problem space.”
SAGE is a European Horizon 2020 funded research project, with 10 highly respected Partners led by Seagate. Indeed a multi-disciplinary collaborative approach is essential to understand and address the needs of storage systems for data intensive applications and use cases of the future. The SAGE Project will re-define data storage for the next decade, with the depth of capabilities for the Exascale HPC compute era alongside a breadth of future ‘Big Data’ applications. For further project details please refer to the Research tab.
The SAGE project tasks itself with building a datacentric infrastructure for handling extreme data in the Exascale/Exabyte era centered on a storage-oriented solution, ‘Percipient Storage’.
Percipient Storage is an advanced object based storage solution, with a very flexible new API enabling applications to achieve Exascale I/O loads exploiting deep I/O hierarchies. The solution will have the capability to run computations on data from any tier – with a homogenous view of data throughout the stack. The SAGE Project incorporates co-design, with Extreme Computing I/O and Data Intensive Science use cases, in a variety of domains including physics, space sciences, meteorology, genetics and biology. The ultimate ambition of the SAGE platform is to lead to more efficient science and HPDA (High Performance Data Analytics) for industry and commercial applications. SAGE recently published a substantive White Paper – Data Storage for Extreme Scale. It outlines technical details of progress to date and architectural plans moving forward.”
The SAGE system is built on multiple tiers of storage device hardware technology. SAGE does not require a specific type of storage device technology, but typically it would include at least one NVRAM tier (Intel 3DxPoint technology is a strong contender at the moment), at least one flash tier and at least one disk tier. Together, these tiers are housed in standard form-factor enclosures and provide their own compute capability, enabled by standard x86 embedded processing components. Moving up the system stack, compute capability increases for faster, lower latency devices.