“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” These “data lake” systems will hold massive amounts of data and be accessible through file and web interfaces. Data protection for data lakes will consist of replicas and will not require backup since the data is not updated. Erasure coding will be used to protect large data sets and enable fast recovery. Open source will be used to reduce licensing costs and compute systems will be optimized for map reduce analytics. Automated tiering will be employed for performance and long-term retention requirements. Cold storage, storage that will not require power for long-term retention, will be introduced in the form of tape or optical media.”
“The human microbiome plays a role in processes as diverse as metabolism, immune function, and mental health. Yet despite the importance of this system, scientists are just beginning to uncover which microorganisms reside in and on our bodies and determine what functions they perform. The development of innovative technology and analytical methods has enabled researchers like Dr. Pollard to decode the complex interactions between our human cells and microbial brethren, and infer meaning from the staggering amounts of data 10 trillion organisms create.”
The 32nd International Conference on Massive Storage Systems and Technology (MSST 2016) has issued its Call for Participation & Papers. The event takes place April 30 – May 6 in Santa Clara, CA. “The Program Committee requests presentation proposals on issues in designing, building, maintaining, and migrating large-scale systems that implement databases and other kinds of large, typically persistent, web-scale stores (HSM, NoSQL, key-value stores, etc.), and archives at scales of tens of petabytes to exabytes and beyond.”
Today FlyElephant announced new tools, a series of webinars, and the formation of a community around the platform. FlyElephant is a platform that provides scientists with computing infrastructure for calculation and automates routine tasks and allows focus on the core issues of research.
In this video from the Dell booth at SC15, Addison Snell from Intersect360 Research discusses why HPC is now important to a broader group of use cases, and dug deep into overviews of HPC for research, life sciences and manufacturing. Participants learned more about why HPC, Big Data, and Cloud are converging.
OCF in the U.K. recently deployed a new Fujitsu HPC cluster at the University of East Anglia. As the University’s second new HPC system in 4-years, the cluster can be easily scaled and expanded in the coming months through a framework agreement to match rapidly increasing demand for compute power.
Astronomers are using iRODs data technology to study the evolution of galaxies and the nature of dark matter. “By partnering with data specialists at UNC-Chapel Hill’s Renaissance Computing Institute (RENCI) who develop the integrated Rule-Oriented Data System (iRODS) researchers and students now have online databases for two large astronomical data sets: the REsolved Spectroscopy of a Local VolumE (RESOLVE) survey and the Environmental COntext (ECO) catalog.”
In this video, Dr. Michael Karasick from IBM moderates a panel discussion on Machine Learning. “The success of cognitive computing will not be measured by Turing tests or a computer’s ability to mimic humans. It will be measured in more practical ways, like return on investment, new market opportunities, diseases cured and lives saved.”
In this video from SC15, Dr. Eng Lim Goh from SGI describes how the company is embracing new HPC technology trends such as new memory hierarchies. With the convergence of HPC and Big Data as a growing trend, SGI is envisions a “Zero Copy Architecture” that would bring together a traditional supercomputer with a Big Data analytics machine in a way that would not require users to move their data between systems.
“Analytics applied over complex, many-to-many data relationships hit the ‘Graph Cache Thrash’ bottleneck and grind to a halt, failing to deliver good performance or to operate at scale,” said Brad Bebee, SYSTAP CEO. “GPU hardware provides a compelling performance increase for data-intensive, predictive analytic applications. With Blazegraph and our new GPU products, users can harness the computing power comparable to what was only available from supercomputers, such as a Cray, at a fraction of the cost.”