The Tightly Coupled Accelerators (TCA) architecture is designed to reduce the communication latency between accelerators over different nodes.
When used correctly, atomic operations can help implement a wide range of generic data structures and algorithms in the massively threaded GPU programming environment.
In this video, Nvidia’s Mark Ebersole demonstrates Cuda development on an ARM-based platform. With CUDA 5.5, it is now possible to compile and run CUDA applications on ARM-based systems such as the Kayla development platform. In addition to native compilation on an ARM-based CPU system, it is also possible to cross-compile for ARM systems, allowing […]
The Extreme Scaling Workshop has released the program agenda for their event coming up in Boulder on August 15-16. The annual workshop will address large scale heterogeneous computing using GPUs and many-core processors. Systems such as Blue Waters, Stampede, and Titan take a major step from modest scale, heterogeneous test beds and prototypes to world-class, […]
Over at Tom’s Hardware, Kevin Parrish writes that the newly released CUDA 5.5 programming model now supports ARM platforms. It also features a number of advanced performance and productivity features including enhanced Hyper-Q support, MPI workload prioritization, guided performance analysis, and fast cross-compile on x86. Since developers started using CUDA in 2006, successive generations of […]
Over at OLCF, Katie Elyce Jones writes that Oak Ridge’s move to a hybrid architecture with the Titan supercomputer required years of code planning that is now paying off with new levels of application performance. Porting the same models and algorithms meant for CPUs to a GPU will not get as good of gains as […]
Today Nvidia announced the GeoInt Accelerator, a GPU-accelerated geospatial intelligence platform to enable security analysts to find actionable insights quicker and more accurately than ever before from vast quantities of raw data, images and video. Today’s intelligence analyst needs information based on imagery, video, signals intelligence, human intelligence and other sources, in a geospatial context […]
Over at the Nvidia Developer Zone, Daniel Armyr and Dan Doherty from MathWorks describe how you can use MATLAB to support your development of CUDA C and C++ kernels. While algorithms written for the GPU are often much faster, the process of building a framework for developing and testing them can be time-consuming. Many programmers […]
Silicon Mechanics, a manufacturer of rackmount servers, storage, and computing hardware, now offers a remote test drive of Nvidia Tesla K20 GPU accelerators. The company says that customers who sign up for the free test can see how they can use parallel processing to accelerate applications by up to 10 times compared to multi-core x86 […]
In this video from the HPC Advisory Council Europe Conference, Hari Subramoni and Sreeram Potluri from Ohio State University present: MVAPICH2 and GPUDirect RDMA. Download the slides (PDF) or check out more talks at our HPC Advisory Council Europe Conference Video Gallery.