Entries filed under “GPUs”

News related to the used of general purpose graphical processing units (GP-GPUs) in HPC gear.

PGI Demo at GTC 2013: OpenACC Binary Runs on Nvidia GPUs and AMD Firepro

In this video from the 2013 GPU Technology Conference, Michael Wolfe from The Portland Group demos how OpenACC enables programmers to generate a single binary that runs on Nvidia GPUs and AMD Firepro accelerators.

Also posted in Accelerators, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, HPC Software, OpenACC | Leave a comment

Video: Amazing DigiCortex Engine Maps the Brain with GPUs

In this video from the 2013 GPU Technology Conference, Ivan Dimkovic and Ana Balevic describe the ground-breaking DigiCortex Engine. Recently ported to CUDA, the application has seen huge speedups with GPU computing.

DigiCortexis my hobby project implementing large-scale simulation and visualization of biologically realistic cortical neurons, synaptic receptor kinetic, axonal action potential propagation delays as well as long-term and short-term synaptic plasticity. Current version of DigiCortex is heavily optimized for Intel CPUs (including Sandy Bridge AVX instruction set). The first CUDA-enabled version with GPU acceleration (CUDA optimizations done by Ana Balevic) is available as of v0.95

The simulation footage in this video is really gorgeous, so be sure to watch it in HD mode. Read the Full Story.

Also posted in Computing Research, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

Video: Penguin Computing Showcases High Density Relion 2808GT GPU Server

In this video from the GPU Technology Conference, David Ingersol from Penguin Computing describes the company’s new Relion 2808GT server, which packs 8 GPUs into a 2U server chassis for High Performance Computing.

Also posted in Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

GTC 2013: ARM + GPU = GPU’riffic, says Barcelona SCC

In this special guest feature, Dan Olds from Gabriel Consulting writes that the Barcelona Supercomputer Center is making a big bet on ARM processing for HPC.

Over the last few years, we’ve seen a steadily growing buzz surrounding the use of ARM chips in PCs, servers, and supercomputers. Here at GTC 2013, that buzz is even more pronounced due to NVIDIA’s upcoming Project Denver, and advances in their GPU technology that result in even less dependency on having a fast and powerful (read: Xeon) processor feeding the GPU number-crunching beasts. Our pal Rik Myslewski penned a great article on GTC 2013 ARM chatter here.

While most everyone has been debating and speculating about what it would be like to combine ARM processors and GPU accelerators, one organization has put together some hardware in order to separate the theoretical from the real. The Barcelona Supercomputer Center (from the Barcelona in Spain, not the other one) is building clusters to explore the potential advantages that might arise from combining low power ARM processors with fast number-crunching GPUs.

Their first attempt, the Tibadabo, was a proof of concept to determine whether it’s possible to build an all-ARM-based cluster. Could they really put together a cluster based on cell phone processors? And, if they could build it, could they find or adapt enough software for it to do useful work?

They were able to construct a two-rack cluster containing 32 blades, 256 nodes, and a total of 512 Tegra 2 ARM cores. They were able to port 11 scientific apps over to ARM with little difficulty, although they did need to fiddle around with the memory hierarchy to optimize some of the apps.

The performance wasn’t all that great. The total system turned out 512 GFLOPs while consuming 3.4 KW, yielding .015 GGLOPs/watt. For context, the best systems on the most recent Green500 list come in around 2.4 or 2.5 GFLOPs/watt, while the systems at the end of the list are rated at .033 GFLOPs/watt.

So they went back to the drawing board and, using NVIDIA’s CARMA development box, clustered 16 of them together as a learning experience they called Pedraforca v1. This system did much better than the ARM-only Tibadabo on energy efficiency, yielding .78 GFLOPs/watts on DGEMM and 5.04 in SGEMM (matrix multiply double and single precision), so they were making progress.

Limitations in the platform (PCIe max of 400 MB/s plus inability to overlap computation and data transfers) meant it couldn’t be scaled up very well. However, it did lead them to a new breakthrough in their thinking for their next system, which they’ve dubbed Pedraforca V2.

They’ve decided the key to building a highly efficient system isn’t to build an accelerated cluster but to build a cluster of accelerators. While there isn’t much difference in the words, there’s a world of difference between the meanings. With Pedraforca v2, they will be de-coupling the CPUs from the GPUs, meaning that the ratio of CPU-GPU can be changed to fit the workloads. They will also be using direct GPU-GPU data transfers via Mellanox’s ConnectX-3 Infiniband interconnects.

This will take a huge amount of latency out of the system and, accordingly, reduce the amount of work the CPU needs to do to orchestrate GPU communications. The prototype system will have 64 nodes which will utilize a quad-core Tegra 3 CPU that will slide into a 4x PCIe slot on a Mini-ITX carrier. In this configuration, the CPU will only be managing boot and MPI communications, plus minimal traffic cop duty for the GPUs. The point is that you don’t need a hugely fast and powerful processor to fulfill these requirements.

However, Pedraforca v2 will have some processing power in the form of Kepler-based NVIDIA K20 GPUs that can deliver 1,170 GFLOP/s through a PCIe Gen3 slot. The GPUs will be able to communicate with each other at 40 Gb/s via the aforementioned Mellanox-fueled Infiniband interconnect.

Both presenters pointed out that this isn’t a general purpose HPC system – it is intended as a host for apps that are GPU-optimized. While they didn’t discuss any FLOPs/watt estimates or performance predictions, it’s safe to say that this system should be an eye opener when it comes to energy efficiency and even cost per FLOP. It’s definitely a project worth watching.

Also posted in Compute, Green HPC, HPC, HPC Hardware | Leave a comment

How Harley-Davidson Builds Bikes Faster with GPU Computing

One of my favorite talks this week from the GPU Technology Conference was a presentation from Matthew Gueller from Harley-Davidson. Over at the Nvidia Blog, Ken Brown writes that Harley is using GPUs for 3D modeling that cuts months off its design cycle.

Harley-Davidson has been designing and manufacturing motorcycles for over 110 years. While the motorcycles designs remain true to the heritage, the process has evolved to incorporate many new tools into the conceptual design process to reduce the time required to develop new products, improve styling intent and to allow for greater conceptual exploration. By leveraging tools from Bunkspeed, Keyshot, Autodesk, Daussalt and others, we have added flexibility to our process for delivering high quality designs earlier. This presentation will go thru some of the conceptual design workflows and show how Harley-Davidson uses visualization tools to bring it all together. Feedback on GPU vs CPU performance benchmarking done at Harley-Davidson and how these tools are leveraged will be provided.

Read the Full Story.

Also posted in Digital Manufacturing, Events, GTC - GPU Technology Conference, HPC, HPC Hardware | Leave a comment

Rob Farber on the Far-reaching HPC Implications from GTC 2013

In this video, CUDA book author Rob Farber discusses the recent Nvidia keynote at the 2013 GPU Technology Conference. As a technologist, Rob thinks some of the things that weren’t said by Nvidia CEO Jen-Hsun Huang during the talk are very significant in terms of high performance computing and the business of accelerated computing.

Also posted in Business of HPC, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | 2 Comments

Seneca NexLink A24 Sports Up to 4 Intel Xeon Phi Coprocessors or 4 Nvidia GPUs

In this video from the GPU Technology Conference, Brett Stouder from Seneca describes the company’s new Nexlink A24 Series Server, which  Sports Up to 4 Intel Xeon Phi Coprocessors or Nvidia Tesla GPUs in a 2U chassis.

Also posted in Co-processors, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

Day 3 Keynote from GTC: Behind the Science in Automotive Design



Video streaming by Ustream

At insideHPC, we are very pleased to bring you live streaming keynotes from the GPU Technology Conference this week in San Jose.

Tune in right here on Thursday, March 21 at 11:00am PT for the next keynote as Ralph Gilles from Chrysler Group LLC presents: Behind the Science in Automotive Design.

Ralph Gilles, senior vice president – Product Design and president and CEO – SRT (Street and Racing Technology) Brand and Motorsports at Chrysler Group LLC and the mind behind some of the company’s most innovative products, will provide a behind-the-scenes look at the auto industry. Gilles will review how GPUs are used to advance every step of the automobile development process – from the initial conceptual designs and engineering phases through product assembly and marketing. He will also discuss and how Chrysler Group utilizes GPUs and the latest technologies to build better, safer cars and reduce time to market.


Also posted in Digital Manufacturing, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

Nvidia to Stack DRAM on Future ‘Volta’ GPUs

Over at The Register, Timothy Prickett Morgan writes that Nvidia has announced plans to stack up DRAM on future ‘Volta’ GPUs to deliver over 1TB/sec of memory bandwidth. Due sometime around 2016, Volta’s memory technology will bring memory closer to the GPU, increasing bandwidth while reducing latency.

Volta is going to solve one of the biggest issues with GPUs today, which is access to memory bandwidth,” explained Huang. “The memory bandwidth on a GPU is already several times that of a CPU, but we never seem to have enough.” So with Volta, Nvidia is going to get the memory closer to the GPU so signals do not have to come out of the GPU, onto a circuit board, and into the GDDR memory. This current approach takes more power (you have to pump up the signal to make it travel over the board), introduces latencies, decreases bandwidth.

In related projects, Micron, Intel, and IBM are partnering on an effort to stack up DRAM, with hopes to commercialize something in the next few years. Read the Full Story.

Also posted in Computing Research, Events, GTC - GPU Technology Conference, HPC, HPC Hardware | Leave a comment

Video: Accelerated Computing Goes Beyond HPC to Tackle Big Data

In this video, Sumit Gupta from Nvidia presents: Accelerated Computing Goes Beyond HPC. A wide array of companies are now using GPUs to accelerate Big Data analytics, and Gupta describes how these efforts are delivering competitive advantage.

Download the Slides (PDF).

Also posted in Events, GTC - GPU Technology Conference, HPC, HPC Hardware, Video | Leave a comment

New Penguin Relion 2808GT Delivers Leading Compute Density for HPC Apps

Today Penguin Computing announced the availability of the Relion 2808GT, a high-density server platform that supports eight GPUs or coprocessors in only 2U. Designed for scientific and engineering applications, the Relion 2808GT is tailor made for popular codes such as Matlab, Amber and Abaqus.

Penguin has been delivering integrated GPU computing clusters since the version 1.0 of this technology,” said CEO Charles Wuischpard. “The new Relion 2808GT platform in conjunction with the latest GPU and coprocessor technology delivers unprecedented levels of performance. The Relion 2808GT enables our HPC customers to further accelerate their research by shortening the time to result for their simulations.”

In terms of computational density, a fully configured server with eight NVidia K20 GPUs can achieve 28 TFLOPs of single precision floating point performance.

The Relion 2808GT will be displayed at the NVidia GPU Technology Conference from March 18 – 21 in San Jose, California. Read the Full Story.

Also posted in Co-processors, Events, GTC - GPU Technology Conference, HPC, HPC Hardware | Leave a comment

Tuesday Keynote from GPU Technology Conference



Video streaming by Ustream

In this video, Nvidia’s CEO Jen-Hsun Huang kicks off the GTC Conference with a talk on What’s Next in GPU Technology.

Short on time? In this video, we’ve grabbed the HPC section of the keynote for your viewing pleasure.

At insideHPC, we are very pleased to bring you live streaming keynotes from the GPU Technology Conference all this week from San Jose. Tune in right here on Wednesday, March 20 at 11:00am PT for the next keynote from Erez Lieberman Aiden from the Baylor College of Medicine.

Also posted in Cuda, Events, GTC - GPU Technology Conference, HPC, HPC Hardware, HPC Software, Video | Leave a comment

Video: KernelGen — Next-generation Compiler Platform for Accelerating GPUs

In this video from the HPC Advisory Council Switzerland Conference, Dmitry Mikushin from the University of Lugano presents KernelGen — Next-generation Compiler Platform for Accelerating GPUs. Download the Slides (PDF).

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Hardware, HPC Software, Video | Leave a comment

Video: GPU Computing With Nvidia’s Kepler Architecture

In this video from the HPC Advisory Council Switzerland Conference, Axel Koehler from Nvidia presents: GPU Computing With Nvidia’s Kepler Architecture.

Download the Slides (PDF). You can also see Kohler’s other talk from last week on Management of Large-scale GPU Clusters.

In related news, be sure to tune in to insideHPC tomorrow for an exclusive, live-streamed keynote from the GPU Technology Conference, starting at 9:00am PT, Tuesday, March 19. Nvidia CEO Jen-Hsun Huang will present What’s Next in GPU Technology.

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Hardware, Video | Leave a comment

Direct MPI from NVIDIA Tesla and Intel Xeon Phi Accelerator Memories on an IB Cluster

In this video from the HPC Advisory Council Switzerland Conference, Sadaf Alam from the Swiss Supercomputing Center presents: Direct MPI from NVIDIA Tesla and Intel Xeon Phi Accelerator Memories on an InfiniBand Cluster.

Download the Slides (PDF).

Also posted in Co-processors, Events, HPC, HPC Advisory Council Workshop, HPC Hardware, MPI, Video | Leave a comment

Advertisement

Nvidia Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap