“Introducing the Kayla Platform for computing on the ARM architecture – where supercomputing meets mobile computing. The Kayla platform is powered by an NVIDIA Tegra Quad-core ARM processor and a Kepler GPU to deliver the highest performance, highest efficiency for the next generation of CUDA and OpenGL application. Pre-installed with CUDA 5 and supporting OpenGL 4.3, it provides ARM applications development across the widest range of application types. The Kayla platform will be available Spring 2013.”
Short on time? In this video, we’ve grabbed the HPC section of the keynote for your viewing pleasure.
At insideHPC, we are very pleased to bring you live streaming keynotes from the GPU Technology Conference all this week from San Jose. Tune in right here on Wednesday, March 20 at 11:00am PT for the next keynote from Erez Lieberman Aiden from the Baylor College of Medicine.
Today Nvidia announced that growing ranks of Python users can now take full advantage of GPU acceleration for HPC and Big Data analytics applications by using the CUDA parallel programming model. As a popular, easy-to-use language, Python enables users to write high-level software code that captures their algorithmic ideas without delving deep into programming details. Python’s extensive libraries and advanced features make it ideal for a broad range of HPC science, engineering and big data analytics applications.
Our research group typically prototypes and iterates new ideas and algorithms in Python and then rewrites the algorithm in C or C++ once the algorithm is proven effective,” said Vijay Pande, professor of Chemistry and of Structural Biology and Computer Science at Stanford University. “CUDA support in Python enables us to write performance code while maintaining the productivity offered by Python.”
Support for CUDA parallel programming comes from NumbaPro, a Python compiler in the new Anaconda Accelerate product from Continuum Analytics. This support was made possible by Nvidia’s contribution of the CUDA compiler source code into the core and parallel thread execution backend of LLVM, a widely used open source compiler infrastructure. Read the Full Story.
Over at the Nvidia Blog, Roy Kim writes that the new Kepler-based GTX TITAN is the ultimate CUDA development GPU.
1.3 Teraflops for Under $1,000
For the first time, GTX TITAN provides access to developers to over a teraflop of double-precision performance in a commercially-available GPU, transforming their PCs into personal supercomputers. That’s big news: for scientists, accessibility to computing resources is one of the biggest hurdles in advancing research. Many have to wait weeks to months for access to a supercomputer or a campus-wide cluster.
Over at the Parallel for All blog, Mark Harris writes that Shared memory is a powerful feature for writing well optimized CUDA code. Access to shared memory is much faster than global memory access because it is located on chip.
Because shared memory is shared by threads in a thread block, it provides a mechanism for threads to cooperate. One way to use shared memory that leverages such thread cooperation is to enable global memory coalescing, as demonstrated by the array reversal in this post. By reversing the array using shared memory we are able to have all global memory reads and writes performed with unit stride, achieving full coalescing on any CUDA GPU.
Over at the Nvidia Developer Zone, Calisa Cole interviews Bob Zigon of Beckman Coulter, a company that develops, manufactures and markets products that simplify and automate complex biomedical testing. Zigon is working on a working on a prototype of a new CUDA-based application that will calculate the molar mass, gross shape and size distribution of protein samples by way of analytical ultracentrifugation (AUC). The application is currently 120 times faster than existing software.
CUDA and Tesla are disruptive technologies. When they are applied to our problems we are capable of returning answers to clinicians and researchers in a fraction of a second. This causes people to change the way they interact with the data. I’ve seen this behavioral change repeatedly over the last three years. Instead of looking at the data from 100,000 white blood cells, researchers can now manipulate five million cells.