In this video from the 2013 HPC User Forum, Scott Schultz from Mellanox presents an overview of Mellanox and HPC.
Download the slides (PDF) or check out the HPC User Forum Video Gallery.
In this video from the 2013 HPC User Forum, Scott Schultz from Mellanox presents an overview of Mellanox and HPC.
Download the slides (PDF) or check out the HPC User Forum Video Gallery.
Today Mellanox announced plans to acquire photonics leader Kotura, Inc. for approximately $82 million. The acquisition is expected to expand Mellanox’s ability to deliver cost-effective, high-speed networks with next generation optical connectivity, allowing data center customers to meet the growing demands of high-performance, Web 2.0, cloud, data center, database, financial services and storage applications. Mellanox believes that the Kotura acquisition will enhance its ability to provide leading technologies for high speed, scalable and efficient end-to-end interconnect solutions.
Operating networks at 100 Gigabit per second rates and higher requires careful integration between all parts of the network. We believe that silicon photonics is an important component in the development of 100 Gigabit InfiniBand and Ethernet solutions, and that owning and controlling the technology will allow us to develop the best, most reliable solution for our customers,” said Eyal Waldman, president, CEO and chairman of Mellanox Technologies. “We expect that the proposed acquisition of Kotura’s technology and the additional development team will better position us to produce 100Gb/s and faster interconnect solutions with higher-density optical connectivity at a lower cost. We welcome the great talent from Kotura and look forward to their contribution to Mellanox’s continued growth.”
Read the Full Story.
Slides from the HPC User Forum are now available for download.
This week the University of Florida unveiled HiPerGator, the state’s most powerful supercomputer with 157 Teraflops of peak performance.
UF worked with Dell, Terascala, Mellanox and AMD to build a machine that makes supercomputing power available to all UF faculty and their collaborators and spreads HiPerGator’s computing power over multiple simultaneous jobs instead of focused on a single task at warp speed. HiPerGator features the latest in high-performance computing technology from Dell and AMD with 16,384 processing cores; a Dell Terascala HPC Storage Solution (DT-HSS 4.5) with the industry’s fastest open-source parallel file system; and Mellanox’s FDR 56Gb/s InfiniBand interconnects that provide the highest bandwidth and lowest latency. Together these features provide UF researchers unprecedented computation and faster access to data to quickly further their research.
Read the Full Story or check out the Fact Sheet on HiPerGator.
In this video from the 2013 Open Fabrics Developer Workshop, Paul Grun from Cray leads a panel discussion on Scaling with PGAS Languages.
Panelists:
You can check out more OFA videos at our Open Fabrics Workshop Video Gallery.
We try to keep this page up to date, but you can always find our latest works on our RichReport YouTube Channel.
SC12 Videos (Alphabetical by Vendor Name):
Adaptive Computing
Adaptive Computing SC12 Booth Theater
Aeon
Allinea
Altair
AMD
Asetek
Bull
CAPS-Enterprise
Colfax International
Cycle Computing
DDN
Dell
Gnodal
HPC Advisory Council
IBM
IDC
Inktank
Intel
Intersect360 Research
NAG
Nirvana
Numascale
Nvidia
Mellanox
OpenSFS & EOFS
Panasas
Penguin Computing
Rogue Wave Software
Samplify
SC12 Committee
Scalable Informatics
Seneca
SGI
Solarflare
Spectra Logic
Supermicro
Sugon
Texas Instruments
The Portland Group
VMware
Xyratex
In this video from the 2013 Open Fabrics Developer Workshop, Sreev Doddabalapur from Mellanox presents: Accelerating Big Data over RDMA.
You can check out more OFA videos at our Open Fabrics Workshop Video Gallery.
In this video from the 2013 Open Fabrics Developer Workshop, Ali Ayoub from Mellanox presents: Ethernet over InfiniBand (EoIB).
Download the slides (PDF). You can check out more OFA videos at our Open Fabrics Workshop Video Gallery.
Over at The Register, Timothy Prickett Morgan writes that a GE presentation at the recent GPU Technology Conference discussed the benefits of Remote Direct Memory Access (RDMA) for InfiniBand and its companion GPUDirect method of linking GPU memories to each other across InfiniBand networks.
On plain old CPUs, RDMA allows CPUs running in one node to reach out through an InfiniBand network and directly read data from another node’s main memory, or push data to that node’s memory without having to go through the operating system kernel and the CPU memory controller. If you prefer 10 Gigabit Ethernet links instead, there is an RDMA over Converged Ethernet, or RoCE, wrapper that lets RDMA run on top of Ethernet – as the name suggests. With GPUDirect, which is something that InfiniBand server adapter and switch maker Mellanox Technologies has been crafting with Nvidia for many years, the idea is much the same. Rather than having a GPU go back to the CPU and out over the network to get data that has been chewed on by another GPU, just let the GPUs talk directly to each other over InfiniBand (or Ethernet with RoCE) and get the CPU out of the loop.

GE's IPN251 hybrid computing card marries a Core i7, a Xilinx FPGA, and an Nvidia Kepler GPU with a PCI switch
Read the Full Story.
This week Mellanox announced that its end-to-end FDR InfiniBand technology is powering the Stampede supercomputer at the TACC. As the most powerful supercomputing system in the NSF XSEDE program, the 10 Petaflop Stampede system integrates thousands of Dell servers and Intel Xeon Phi coprocessors with Mellanox FDR 56Gb/s InfiniBand SwitchX based switches and ConnectX-3 adapter cards.
The InfiniBand network was easy to deploy and delivers incredible application performance on a consistent basis,” said Tommy Minyard , director of Advanced Computing Systems, TACC. “Utilizing Mellanox FDR 56Gb/s InfiniBand provides us with extremely scalable, high performance — a critical element as Stampede is designed to support hundreds of computationally- and data-intensive science applications from around the United States and the world.”
Stampede supports national scientific research into weather forecasting, climate modeling, drug discovery and energy exploration and production. Read the Full Story.
In this video from the HPC Advisory Council Switzerland Conference, Dan Waxman from Mellanox provides a hands-on training for InfiniBand entitled: Hands-on Training: Know Your Cluster Bottlenecks and Maximize Performance.
Depending on the application of the user’s system, it may be necessary to modify the default configuration of the network adapters and the system/chipset configuration. This slide deck describes common tuning parameters, settings & procedures that can improve performance of the network adapter. Different Server & NIC vendors may have different recommendations for the values to be set – but the general tuning approach should be similar. For the hands-on demo we will utilize Mellanox ConnectX adapters – thus we will implement the recommended settings issued by Mellanox.
In this special guest feature, Dan Olds from Gabriel Consulting writes that the Barcelona Supercomputer Center is making a big bet on ARM processing for HPC.
Over the last few years, we’ve seen a steadily growing buzz surrounding the use of ARM chips in PCs, servers, and supercomputers. Here at GTC 2013, that buzz is even more pronounced due to NVIDIA’s upcoming Project Denver, and advances in their GPU technology that result in even less dependency on having a fast and powerful (read: Xeon) processor feeding the GPU number-crunching beasts. Our pal Rik Myslewski penned a great article on GTC 2013 ARM chatter here.
While most everyone has been debating and speculating about what it would be like to combine ARM processors and GPU accelerators, one organization has put together some hardware in order to separate the theoretical from the real. The Barcelona Supercomputer Center (from the Barcelona in Spain, not the other one) is building clusters to explore the potential advantages that might arise from combining low power ARM processors with fast number-crunching GPUs.
Their first attempt, the Tibadabo, was a proof of concept to determine whether it’s possible to build an all-ARM-based cluster. Could they really put together a cluster based on cell phone processors? And, if they could build it, could they find or adapt enough software for it to do useful work?
They were able to construct a two-rack cluster containing 32 blades, 256 nodes, and a total of 512 Tegra 2 ARM cores. They were able to port 11 scientific apps over to ARM with little difficulty, although they did need to fiddle around with the memory hierarchy to optimize some of the apps.
The performance wasn’t all that great. The total system turned out 512 GFLOPs while consuming 3.4 KW, yielding .015 GGLOPs/watt. For context, the best systems on the most recent Green500 list come in around 2.4 or 2.5 GFLOPs/watt, while the systems at the end of the list are rated at .033 GFLOPs/watt.
So they went back to the drawing board and, using NVIDIA’s CARMA development box, clustered 16 of them together as a learning experience they called Pedraforca v1. This system did much better than the ARM-only Tibadabo on energy efficiency, yielding .78 GFLOPs/watts on DGEMM and 5.04 in SGEMM (matrix multiply double and single precision), so they were making progress.
Limitations in the platform (PCIe max of 400 MB/s plus inability to overlap computation and data transfers) meant it couldn’t be scaled up very well. However, it did lead them to a new breakthrough in their thinking for their next system, which they’ve dubbed Pedraforca V2.
They’ve decided the key to building a highly efficient system isn’t to build an accelerated cluster but to build a cluster of accelerators. While there isn’t much difference in the words, there’s a world of difference between the meanings. With Pedraforca v2, they will be de-coupling the CPUs from the GPUs, meaning that the ratio of CPU-GPU can be changed to fit the workloads. They will also be using direct GPU-GPU data transfers via Mellanox’s ConnectX-3 Infiniband interconnects.
This will take a huge amount of latency out of the system and, accordingly, reduce the amount of work the CPU needs to do to orchestrate GPU communications. The prototype system will have 64 nodes which will utilize a quad-core Tegra 3 CPU that will slide into a 4x PCIe slot on a Mini-ITX carrier. In this configuration, the CPU will only be managing boot and MPI communications, plus minimal traffic cop duty for the GPUs. The point is that you don’t need a hugely fast and powerful processor to fulfill these requirements.
However, Pedraforca v2 will have some processing power in the form of Kepler-based NVIDIA K20 GPUs that can deliver 1,170 GFLOP/s through a PCIe Gen3 slot. The GPUs will be able to communicate with each other at 40 Gb/s via the aforementioned Mellanox-fueled Infiniband interconnect.
Both presenters pointed out that this isn’t a general purpose HPC system – it is intended as a host for apps that are GPU-optimized. While they didn’t discuss any FLOPs/watt estimates or performance predictions, it’s safe to say that this system should be an eye opener when it comes to energy efficiency and even cost per FLOP. It’s definitely a project worth watching.
Today ClusterVision announced the installation of a 200 Teraflop supercomputer a the University of Paderborn. With 614 compute nodes and 10,000 cores, the hybrid system will run a wide range of commercial and open source HPC applications in technology and science. As a hybrid system, the supercomputer also includes 32 NVIDIA K20 GPUs and 8 Intel Xeon Phi coprocessors, providing an additional 40 Teraflops of compute power.
This system is a powerful compute resource for all researchers in the region of East Westphalia and Lippe, and our partners in Germany and Europe,” Prof.Dr. Holger Karl, head of the PC2 board.
With a system interconnect powered by Mellanox QDR InfiniBand, the Paderborn cluster uses Dell PowerVault MD3200 storage components powered by the FraunhoferFS FhGFS the parallel file- system. Read the Full Story.
In this video from the HPC Advisory Council Switzerland Conference, Colin Bridger presents: Mellanox: The Foundation for Scalable Computing.
insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap