In this video from Moabcon 2013, Troy Baer presents: NICS, Adaptive Computing, and Intel: Leadership in HPC.
An Appro Xtreme-X Supercomputer named Beacon, deployed by the National Institute for Computational Sciences (NICS) of the University of Tennessee, tops the current Green500 list, which ranks the world’s fastest supercomputers based on their power efficiency. To earn its number-one ranking, the supercomputer employed Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors to produce 112.2 trillion calculations per second using only 44.89 kW of power, resulting in world-record efficiency of 2.499 billion floating point operations per second per watt.”
Hierarchical storage management is not new to the HPC crowd, but the idea of optimizing NAS may just be a new concept to many. This week Avere Systems announced that the company’s new FXT 3800 hybrid storage appliance can now automatically tier data across four media types: RAM, SSD, SAS and SATA HDDs, delivering maximum performance for the hottest files. At the same time, the device moves “cold” data out of the performance tier and onto SATA to minimize costs and shrink the data storage footprint.
The performance gains and cost benefits associated with our latest FXT Edge filer demonstrate the massive advantages of a hybrid approach that can precisely match the storage media to the data being accessed,” said Ron Bianchini, President and CEO of Avere Systems. “And when deployed as part of our edge-core architecture, it also delivers the flexibility businesses need to locate storage where it makes most sense for the business.
Over at Enterprise Storage Forum, Henry Newman writes that, while the industry has addressed storage complexity with NAS, SANs, and appliances, storage admins will require a whole new set of skills to meet the future challenges of application storage.
So if you are a skilled, highly talented administrator, what should be your plan to ensure that your salary does not take a nose dive? I think the answer is appliances for data analysis. (I am likely not talking about Hadoop, as many of the architectural designs for products in this area are completed.) Data analysis appliances are in their infancy today and will require significant care and feeding. The types of data analysis are going to be very complex. For example, you might de-pixelize an image and create a database of geolocations, normalizing for the resolution of the image, which might change over time based on improvements in technology. Then you might correlate the pixels to look for weather, climate or some other change like deforestation. This will be far different than taking business data and trying to correlate prices to sales to maximize profits.
Our friends at Avere are offering a free copy of NAS Optimization for Dummies.
Big NAS performance comes from your ability to scale, eliminate sources of latency, and gain the advantages of the cloud. Get started with Avere Systems’ Special Edition of NAS Optimization for Dummies by Allen G. Taylor.
In this book, you’ll find:
How to configure NAS storage for optimal performance
Ways to reduce the cost of upgrades as your storage needs grow
How to minimize the impact of multiple users hitting the storage systems at the same time
Over at The Register, Timothy Prickett Morgan writes that a GE presentation at the recent GPU Technology Conference discussed the benefits of Remote Direct Memory Access (RDMA) for InfiniBand and its companion GPUDirect method of linking GPU memories to each other across InfiniBand networks.
On plain old CPUs, RDMA allows CPUs running in one node to reach out through an InfiniBand network and directly read data from another node’s main memory, or push data to that node’s memory without having to go through the operating system kernel and the CPU memory controller. If you prefer 10 Gigabit Ethernet links instead, there is an RDMA over Converged Ethernet, or RoCE, wrapper that lets RDMA run on top of Ethernet – as the name suggests. With GPUDirect, which is something that InfiniBand server adapter and switch maker Mellanox Technologies has been crafting with Nvidia for many years, the idea is much the same. Rather than having a GPU go back to the CPU and out over the network to get data that has been chewed on by another GPU, just let the GPUs talk directly to each other over InfiniBand (or Ethernet with RoCE) and get the CPU out of the loop.
GE's IPN251 hybrid computing card marries a Core i7, a Xilinx FPGA, and an Nvidia Kepler GPU with a PCI switch
In this video, Nebula CEO Chris Kemp discusses his new product called the Nebula One and the future of cloud computing with Cory Johnson on Bloomberg Television. Kemp was formerly the CTO of NASA IT.
Nebula One brings the cloud to you, under your control, behind your firewall. It is an integrated hardware and software appliance providing distributed compute, storage, and network services in a unified system.
The Nebula One has to be cool — they’ve got Patrick Stewart and Andy Bechtolsheim in their launch video!
Over at GigaOm, GigaStacey writes that the solution for better and faster storage may lie in DSSD, a stealthy chip startup backed by Andy Bechtolsheim. Founded in 2010 by Sun Alums Jeff Bonwick and Bill Moore, DSSD is trying to build a chip that would improve the performance and reliability of flash memory for high performance computing, newer data analytics, and networking.
My sources tell me the startup is building a new type of chip — they said it’s really a module, not a chip — that combines a small amount of processing power with a lot of densely-packed memory. The module runs a pared-down version of Linux designed for storing information on flash memory, and is aimed at big data and other workloads where reading and writing information to disk bogs down the application. This fits with the expertise of the team, but this is a problem that others are trying to solve as well with faster and cheaper SSDs and targeted software to to optimize the flow of bits to a database. But the proposal here appears to be about designing an operating system that takes advantage of the difference in Flash memory when compared to hard drives to boost I/O.
Today Xyratex announced that that the company is now a strategic supplier for AMD and their SeaMicro solutions for Big Data.
AMD will use Xyratex OneStor Modular Enclosure as one of the building blocks for its big data and storage intensive solutions and optimized the SeaMicro SM15000 server to provide more than five petabytes of storage capacity in two racks for big data applications such as Hadoop and Object Storage.
SeaMicro SM15000 server with the Freedom Fabric Storage solution is known in the market for its superior computing efficiency and storage density, as well as the lowest total cost of ownership,” said Dhiraj Mallick, Corporate Vice President and General Manager of Data Center Server Solutions at AMD. “With the combination of the SM15000 and the Xyratex OneStor data storage product, we have a winning solution that is unmatched in storage density and capacity.”
The combination of Xyratex and AMD products delivers an ultra-dense, high performance platform that eliminates excess hardware costs and cabling while simplifying installation and minimizing footprint requirements.
Over at the Xcelerit Blog, Jörg Lotze writes that a recent case study shows how financial services firms can improve the efficiency of their existing compute grids with multithreading.
It becomes clear that a multi-threaded parallel application is far superior to the grid approach traditionally used in banks. The multi-threaded application is 2.2x times faster than the same application running in individual processes. These performance enhancements are achieved with existing hardware but require a major redesign of the software if not employing the right tools. Even greater speedups can be achieved with hardware accelerators such as GPUs.
This week Mellanox announced that its end-to-end FDR InfiniBand technology is powering the Stampede supercomputer at the TACC. As the most powerful supercomputing system in the NSF XSEDE program, the 10 Petaflop Stampede system integrates thousands of Dell servers and Intel Xeon Phi coprocessors with Mellanox FDR 56Gb/s InfiniBand SwitchX based switches and ConnectX-3 adapter cards.
The InfiniBand network was easy to deploy and delivers incredible application performance on a consistent basis,” said Tommy Minyard , director of Advanced Computing Systems, TACC. “Utilizing Mellanox FDR 56Gb/s InfiniBand provides us with extremely scalable, high performance — a critical element as Stampede is designed to support hundreds of computationally- and data-intensive science applications from around the United States and the world.”
Stampede supports national scientific research into weather forecasting, climate modeling, drug discovery and energy exploration and production. Read the Full Story.
Over at the MPI Blog, Jeff Squyres writes that the distance-from-home analogy is good way to help explain application latency.
So when you send a message to a peer (e.g,. MPI_SEND to another MPI process), consider with whom your communicating: are they next door, in the next subdivision, or in the next city? That gives you an idea of the magnitude of the cost of communicating with them. But let’s add another dimension here: caches and RAM. Data locality is a major factor in performance, and is frequently under-appreciated.
There are now something like 30 petascale supercomputers in the world that we know about, but the one that started it all has now retired. Used to help steward the U.S. nuclear stockpile, the Roadrunner supercomputer at Los Alamos was shut down this week after six years of service.
Roadrunner exemplified stockpile stewardship: an excellent team integrating complex codes with advanced computing architectures to ensure a safe, secure and effective deterrent,” said Chris Deeney, NNSA Assistant Deputy Administrator for Stockpile Stewardship. “Roadrunner and its successes have positioned us well to weather the technology changes on the HPC horizon as we implement stockpile modernization without recourse to underground testing.”
IBM built Roadrunner for the DOE National Nuclear Security Administration using a hybrid design with 12,960 IBM PowerXCell 8i and 6,480 AMD Opteron dual-core processors connected by Infiniband. Read the Full Story.
The internet, sensors and high performance computing are some of the top Big Data producers. Recently, there has been increased focus on extracting more value out of these generated data. Analysis of Big Data sets may be simplified as “looking for needle in a haystack” on one end of a spectrum to “looking for relationships between hay in a stack” on the other. We will discuss the architectural platforms and tools suitable for different parts of this spectrum.”