Entries filed under “Tools”

News about compilers, debuggers, communications libraries, and the tools of HPC development.

Warewulf Cluster Manager is “Howlingly Great”

Over at HPC Admin, Dell’s Jeff Layton has posted an in-depth look into the open source Warewulf Cluster Manager.

In this article, I want to discuss the one I have been using for a long time: Warewulf. It pioneered many of the stateless methods that other tools use today and is considered the standard stateless open source toolkit for clustering. It is primarily a stateless cluster provisioning and management tool that can also be installed as a stateful tool (i.e., installed onto disks in the compute nodes). It is simple, automates the process, and is very scalable. In this four-part series on using Warewulf in production clusters, I’ll start by discussing how to install Warewulf on a master node and statelessly boot compute nodes.

Read the Full Story.

Also posted in HPC, HPC Software, System Management | Leave a comment

Slidecast: Allinea Software – Meeting the Quest to Run Applications Faster

In this slidecast, Patrick Wohlschlegel from Allinea Software describes the company’s advanced parallel debugging capabilities.

Our mission is to make it easier for software developers and scientists to make their software scale up to take full advantage of current and emerging parallel computer systems. We do this by developing innovative tools that ensure correctness and optimization of parallel codes, and we are recognized as a leader and innovator in our market. We created the world’s first petascale debugger - allowing users for the first time to debug at any scale. We also developed the first hybrid GPU debugger - enabling simultaneous debugging across multiple architectures in the same tool.”

As announced recently, the Allinea DDT software is being used by the NCSA Blue Waters team to fix their bugs at full scale and exploit the maximum computational power.

Read the Full Story *  Download the MP3Subscribe on iTunes * If Dropbox is blocked, download from this Google page.

Also posted in HPC, HPC Software, Podcast, Video | Leave a comment

New Debugging Solutions and Test Suite for OpenACC

The week the OpenACC standards group announced growing support for OpenACC-supported development tools, and initial results from programmers who have been using the recently-released OpenACC compilers to accelerate research. Designed to enable scientists to take advantage of heterogeneous CPU/GPU computing systems, the OpenACC programming standard is now available in compiler products from the OpenACC founding members, Cray, The Portland Group (PGI) and CAPS enterprise. It is also gaining increasing support in other programming tools, including recently released solutions by Allinea and RogueWave, which provide visual debugging of OpenACC directives on Cray XK6 systems.

Using PGI’s OpenACC compiler, we ported a computational fluid dynamics (CFD) application benchmark to a general purpose GPU-based system,” reported NASA researchers in an upcoming research paper. “OpenACC is a much easier way to accelerate applications than other programming approaches, and we saw an immediate speed up of the benchmark on multiple tests, up to 10X faster compared with a single CPU core-based system.”

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

OpenACC comes to CAPS HMPP Workbench

This week HPC compiler-maker CAPS entreprise announced support for OpenACC in its HMPP Workbench 3.1, a move designed to make Many-Core programming easier.

The GPU computing breakthrough has allowed many users to propose new massively parallel codes to advance many scientific fields. With OpenACC we are simplifying the use of accelerators and leveraging legacy applications. We are very confident that this will help to further broaden the community taking advantage of many-core technologies.” said François Bodin, CAPS CTO.

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

Intel Advisor XE Tool to Turn Your Serial Code into Parallel

The SoftTalk Blog has posted a preview of Intel Advisor XE, a parallel programming tool in the wings. Intel Advisor XE adds Linux, Fortran and C# .Net support to a tool (Intel Parallel Advisor) that was previously only available for C/C++.

Intel Advisor XE is a design tool that helps you to transform serial code to run well on multicore hardware by forecasting what might happen if the code executes in parallel. It helps to identify where parallelisation gives the biggest returns, predicts scalability and overheads, and also helps predict data races. As with many of the Intel parallel programming tools, it uses highly visual graphs to help you identify hotspots and assess the potential performance of your parallel annotations.

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

Intel Guide for Developing Multithreaded Applications

Intel’s Clay Breshears writes that the company has just updated the Intel Guide for Developing Multithreaded Applications.

Many of the articles featured in the Guide remain as relevant to parallel programmers today as they did when the Guide was first put together. Good parallel progarmming practice will always be good parallel programming practice. Even so, technology tools are changing at a steady pace, and we know that the Guide needs to keep up with those changes.

Read the Full Story or Download the PDF.

Also posted in HPC, HPC Software | 2 Comments

TotalView Adds Reverse Debugging and CUDA Support

This week Rogue Wave Software announced it enhancing the standard TotalView debugger with advanced features once sold as add-ons. TotalView now comes standard with both reverse debugging (ReplayEngine), and CUDA debugging support.

Having the ability to debug CUDA in TotalView is a great benefit”, said David Montoya, HPC Infrastructure Team Lead at Los Alamos National Laboratory. “We also anticipate being able to spend less time debugging, and more time on research with the ability to find non-deterministic bugs due to the ReplayEngine functionality. Providing these capabilities to all our users will be a huge advantage to them.”

TotalView customers can contact their Rogue Wave account manager for details on enabling the ReplayEngine and CUDA features. Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

PGI Adds Support for OpenACC & Native CUDA C/C++ for Multi-core x86

Today The Portland Group announced their 2012 release of their high-performance parallelizing compilers and development tools. PGI 2012 adds support for OpenACC directive-based programming model for NVIDIA CUDA-enabled Graphics Processing Units (GPUs). This release is also the first to include the fully feature-enabled PGI CUDA C/C++ compiler for multi-core x64 CPUs from Intel and AMD. In addition, PGI 2012 includes a number of performance and feature enhancements for multi-core x64 processor-based HPC systems.

GPU Accelerators are now a mainstay in HPC with NVIDIA’s CUDA achieving the widest adoption so far”, said Douglas, Miles, director of The Portland Group. “With Release 2012, PGI continues to supplement and refine its GPU programming tools so developers wishing to access the huge potential performance of GPUs can do so in a productive and portable way.”

Read the Release Notes (PDF).

Also posted in HPC, HPC Software | Leave a comment

Altair Aims to Ease Simulation With PBS Pro 11.2

This week Altair announced released a new Compute Manager and PBS Desktop applications. Designed to streamline engineering workflow within an enterprise, the new software allows engineers submit jobs through a Web-based interface, manage workloads, and immediately review and download the results.

The release of Compute Manager and PBS Desktop marks the beginning of the next level of efficiency and ease for engineers engaged in high-performance computing for everything from crash analysis to animation and weather prediction,” said Mahalingam. “Simulations originate on many types of devices these days, and Altair’s high-performance computing tools focus on helping engineers use the resources at their fingertips in a very user-centric way. We are making the process of managing simulation projects more intuitive, more natural, and more efficient.”

With this new release, can use the enhanced graphical interface in PBS Pro 11.2 scales submit jobs on large clusters and obtain maximum value from their computing infrastructure. Read the Full Story.

Also posted in HPC, HPC Software, System Management | Leave a comment

Allinea Adds Sparklines, Cuda 4.1 Toolkit Support to DDT 3.1 Parallel Debugger

This week Allinea rolled out its DDT 3.1 parallel debugger with a number of enhancements including Sparklines and support for the Cuda 4.1 Toolkit.

Our vision is to provide tools for software developers to take advantage of the parallelism present in todays systems, from desktop GPU and multi-core machines through to the largest systems in the world,” said Dr. David Lecomber, CTO of Allinea Software, “This latest release of Allinea DDT adds some truly innovative features – such as sparklines for viewing data across processes, instantly, which builds on our existing smart highlighting of data values. Adding static analysis into the debugger is also a leap forward for users – static analysis hints at parts of the source code that are incorrect and DDT will highlight this whilst you debug.”

Read the Full Story.

Also posted in HPC Software | Leave a comment

Interview: Nvidia Updates Cuda Platform to 4.1

This week Nvidia announced the latest update to their Cuda platform for parallel computing. To learn more, I caught up with Will Ramey, Nvidia’s Sr. Product Manager for GPU Computing.

insideHPC: When we talk about a new Cuda platform, are we talking about the Cuda Toolkit plus SDK? Does this new update have a version number?

Will Ramey: Yes, this release is a new version of the CUDA Toolkit and SDK code samples, as well as updated drivers.  The version number for this release is 4.1

insideHPC: Specifically, what components comprise the platform?

Will Ramey: There are 3 key components to this release (version 4.1):

  1. The CUDA Toolkit is a comprehensive development environment for C and C++ developers building GPU-accelerated applications.  Version 4.1 of CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing application performance.  You’ll also find programming guides, user manuals, API reference, and other documentation to help programmers add GPU acceleration to their applications quickly.  More info at: http://developer.nvidia.com/cuda-toolkit
  2. The CUDA Driver provides a system-level interface for CUDA applications to communicate with the GPUs, and is included in the NVIDIA drivers installer.
  3. NVIDIA also provides an SDK with over 100 GPU Computing SDK code samples, as well as white papers to help developers quickly add GPU acceleration to their applications.  More info at: http://developer.nvidia.com/gpu-computing-sdk

Developers need to install the CUDA Toolkit to build CUDA applications, and the latest NVIDIA drivers so their applications can communicate with the GPUs in their system.  Developers can also choose to install the SDK code samples to learn from the large collection of examples.

To run CUDA applications, end-users only need to install the latest NVIDIA drivers.

insideHPC: What is new within the updated platform?

Will Ramey: In addition to the new LLVM-based compiler that delivers up to 10 percent faster performance, there are a number of significant new features in this release:

  • New & Improved “drop-in” acceleration with GPU-Accelerated Libraries
    • Over 1000 new image processing functions in the NPP library
    • New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
    • New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
    • Bessel functions now supported in the CUDA standard Math library
    • Up to 2x faster sparse matrix vector multiply using ELL hybrid format
  • Enhanced & Redesigned Developer Tools
    • Redesigned Visual Profiler with automated performance analysis and expert guidance system
    • CUDA_GDB support for multi-context debugging and assert() in device code
    • CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
    • Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
    • Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput

  • Advanced Programming Features
    • Access to 3D surfaces and cube maps from device code
    • Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
    • Peer-to-peer communication between processes
    • Support for resetting a GPU without rebooting the system in nvidia-smi
  • New & Improved SDK Code Samples
    • simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
    • New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
    • New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

insideHPC: How do the new components ease code development?

Will Ramey: The new LLVM-based compiler compiles code faster than the old compiler, increasing developer productivity.  As you might expect, the compile-time saved varies by application, but we’ve seen some large applications compile more than 60 minutes faster than with the old compiler.

The NVIDIA Visual Profiler has been completely re-designed to streamline developers’ performance analysis workflow.  The new automated performance analysis feature quickly identifies bottlenecks and opportunities to improve application performance, and is integrated with the “Best Practices” documentation guiding developers through the process of optimizing their applications.  Developers can now achieve the full potential of GPU acceleration in their application with significantly less effort.

The new image & signal processing functions in NPP makes it easier for more developers to accelerate more of their algorithms on the GPU.

The new tri-diagonal solver in cuSPARSE allows developers to just call the pre-optimized version in the library instead of having to write their own.

insideHPC: How do the new components help speed developer code?

Will Ramey: The new LLVM-based compiler includes several new optimization techniques that allow the compiler to generate more efficient code.  This is another case where the performance improvement will vary depending on the application, but we’re seeing up to 10 percent performance improvement across a variety of applications.

Using the new RNGs in cuRAND, image & signal processing functions in NPP, tri-diagonal solver in cuSPARSE, etc. all help developers quickly take advantage of pre-optimized routines that take full advantage of hundreds of cores on the GPU.

insideHPC: If I had the most current version of Cuda yesterday, what’s new that I can download today?

Will Ramey: Today you can download the new CUDA Toolkit, SDK code samples, and drivers.  Available for Linux, MacOS and Windows.

 

Also posted in GPUs, HPC, HPC Software | Leave a comment

Podcast: Turning Up Performance Profiling with Intel VTune Amplifier XE

In this Intel Chip Chat podcast, Allyson Klein and Ramesh Peri discuss developments and benefits of Intel Vtune Amplifier XE, a performance analysis tool for checking app performance on Intel processors. Download the MP3.

Also posted in HPC, HPC Software, Podcast | Leave a comment

Video: Intel Parallel Studio XE Array Building Blocks Demo

In this video, Dr. Mike McCool demos Intel Parallel Studio XE Array Building Blocks.

Also posted in HPC, HPC Software, Video | Leave a comment

Video: Break Your Multicore Program Repeatedly to Bust Bugs

In this video, Roni Simonian from Kloobok presents: Break Your Multicore Program Repeatedly to Bust Bugs.

Maze is a novel testing and debugging environment that removes thread execution uncertainty. Maze stress-tests your concurrent program by taking over process scheduling functions of the operating system, and running your program repeatedly along different execution paths. Maze does this by simulating random context switches in a controllable and reproducible way. When unexpected program behavior has been detected, Maze knows the exact execution sequence that precedes it.

Recorded at the HPC Advisory Council Stanford Workshop on Dec. 7, 2011. Download the Slides (PDF) or take a peek at the Maze User Manual.

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Software, Video | Leave a comment

Video: The Portland Group Showcases the PGI Accelerator at SC11

In this video, Doug Miles from The Portland Group discusses how the PGI Accelerator, which is designed to help programmers make their code go faster on x64+GPU platforms. Recorded at SC11.

Using PGI Accelerator compilers, programmers can accelerate Linux, Mac OS X and Windows applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing high-level standard-compliant Fortran and C programs and then recompiling with appropriate compiler options.

Read the Full Story.

Also posted in Events, HPC, HPC Software, SC11, Video | Leave a comment

Advertisement


View All Videos

insideHPC.com is a production of insideHPC, LLC. © 2006-2011 Sitemap