Entries filed under “Tools”

News about compilers, debuggers, communications libraries, and the tools of HPC development.

Altair Aims to Ease Simulation With PBS Pro 11.2

This week Altair announced released a new Compute Manager and PBS Desktop applications. Designed to streamline engineering workflow within an enterprise, the new software allows engineers submit jobs through a Web-based interface, manage workloads, and immediately review and download the results.

The release of Compute Manager and PBS Desktop marks the beginning of the next level of efficiency and ease for engineers engaged in high-performance computing for everything from crash analysis to animation and weather prediction,” said Mahalingam. “Simulations originate on many types of devices these days, and Altair’s high-performance computing tools focus on helping engineers use the resources at their fingertips in a very user-centric way. We are making the process of managing simulation projects more intuitive, more natural, and more efficient.”

With this new release, can use the enhanced graphical interface in PBS Pro 11.2 scales submit jobs on large clusters and obtain maximum value from their computing infrastructure. Read the Full Story.

Also posted in HPC, HPC Software, Rock Stars of HPC, System Management | Leave a comment

Allinea Adds Sparklines, Cuda 4.1 Toolkit Support to DDT 3.1 Parallel Debugger

This week Allinea rolled out its DDT 3.1 parallel debugger with a number of enhancements including Sparklines and support for the Cuda 4.1 Toolkit.

Our vision is to provide tools for software developers to take advantage of the parallelism present in todays systems, from desktop GPU and multi-core machines through to the largest systems in the world,” said Dr. David Lecomber, CTO of Allinea Software, “This latest release of Allinea DDT adds some truly innovative features – such as sparklines for viewing data across processes, instantly, which builds on our existing smart highlighting of data values. Adding static analysis into the debugger is also a leap forward for users – static analysis hints at parts of the source code that are incorrect and DDT will highlight this whilst you debug.”

Read the Full Story.

Also posted in HPC Software | Leave a comment

Interview: Nvidia Updates Cuda Platform to 4.1

This week Nvidia announced the latest update to their Cuda platform for parallel computing. To learn more, I caught up with Will Ramey, Nvidia’s Sr. Product Manager for GPU Computing.

insideHPC: When we talk about a new Cuda platform, are we talking about the Cuda Toolkit plus SDK? Does this new update have a version number?

Will Ramey: Yes, this release is a new version of the CUDA Toolkit and SDK code samples, as well as updated drivers.  The version number for this release is 4.1

insideHPC: Specifically, what components comprise the platform?

Will Ramey: There are 3 key components to this release (version 4.1):

  1. The CUDA Toolkit is a comprehensive development environment for C and C++ developers building GPU-accelerated applications.  Version 4.1 of CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing application performance.  You’ll also find programming guides, user manuals, API reference, and other documentation to help programmers add GPU acceleration to their applications quickly.  More info at: http://developer.nvidia.com/cuda-toolkit
  2. The CUDA Driver provides a system-level interface for CUDA applications to communicate with the GPUs, and is included in the NVIDIA drivers installer.
  3. NVIDIA also provides an SDK with over 100 GPU Computing SDK code samples, as well as white papers to help developers quickly add GPU acceleration to their applications.  More info at: http://developer.nvidia.com/gpu-computing-sdk

Developers need to install the CUDA Toolkit to build CUDA applications, and the latest NVIDIA drivers so their applications can communicate with the GPUs in their system.  Developers can also choose to install the SDK code samples to learn from the large collection of examples.

To run CUDA applications, end-users only need to install the latest NVIDIA drivers.

insideHPC: What is new within the updated platform?

Will Ramey: In addition to the new LLVM-based compiler that delivers up to 10 percent faster performance, there are a number of significant new features in this release:

  • New & Improved “drop-in” acceleration with GPU-Accelerated Libraries
    • Over 1000 new image processing functions in the NPP library
    • New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
    • New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
    • Bessel functions now supported in the CUDA standard Math library
    • Up to 2x faster sparse matrix vector multiply using ELL hybrid format
  • Enhanced & Redesigned Developer Tools
    • Redesigned Visual Profiler with automated performance analysis and expert guidance system
    • CUDA_GDB support for multi-context debugging and assert() in device code
    • CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
    • Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
    • Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput

  • Advanced Programming Features
    • Access to 3D surfaces and cube maps from device code
    • Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
    • Peer-to-peer communication between processes
    • Support for resetting a GPU without rebooting the system in nvidia-smi
  • New & Improved SDK Code Samples
    • simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
    • New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
    • New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

insideHPC: How do the new components ease code development?

Will Ramey: The new LLVM-based compiler compiles code faster than the old compiler, increasing developer productivity.  As you might expect, the compile-time saved varies by application, but we’ve seen some large applications compile more than 60 minutes faster than with the old compiler.

The NVIDIA Visual Profiler has been completely re-designed to streamline developers’ performance analysis workflow.  The new automated performance analysis feature quickly identifies bottlenecks and opportunities to improve application performance, and is integrated with the “Best Practices” documentation guiding developers through the process of optimizing their applications.  Developers can now achieve the full potential of GPU acceleration in their application with significantly less effort.

The new image & signal processing functions in NPP makes it easier for more developers to accelerate more of their algorithms on the GPU.

The new tri-diagonal solver in cuSPARSE allows developers to just call the pre-optimized version in the library instead of having to write their own.

insideHPC: How do the new components help speed developer code?

Will Ramey: The new LLVM-based compiler includes several new optimization techniques that allow the compiler to generate more efficient code.  This is another case where the performance improvement will vary depending on the application, but we’re seeing up to 10 percent performance improvement across a variety of applications.

Using the new RNGs in cuRAND, image & signal processing functions in NPP, tri-diagonal solver in cuSPARSE, etc. all help developers quickly take advantage of pre-optimized routines that take full advantage of hundreds of cores on the GPU.

insideHPC: If I had the most current version of Cuda yesterday, what’s new that I can download today?

Will Ramey: Today you can download the new CUDA Toolkit, SDK code samples, and drivers.  Available for Linux, MacOS and Windows.

 

Also posted in GPUs, HPC, HPC Software | Leave a comment

Podcast: Turning Up Performance Profiling with Intel VTune Amplifier XE

In this Intel Chip Chat podcast, Allyson Klein and Ramesh Peri discuss developments and benefits of Intel Vtune Amplifier XE, a performance analysis tool for checking app performance on Intel processors. Download the MP3.

Also posted in HPC, HPC Software, Podcast | Leave a comment

Video: Intel Parallel Studio XE Array Building Blocks Demo

In this video, Dr. Mike McCool demos Intel Parallel Studio XE Array Building Blocks.

Also posted in HPC, HPC Software, Video | Leave a comment

Video: Break Your Multicore Program Repeatedly to Bust Bugs

In this video, Roni Simonian from Kloobok presents: Break Your Multicore Program Repeatedly to Bust Bugs.

Maze is a novel testing and debugging environment that removes thread execution uncertainty. Maze stress-tests your concurrent program by taking over process scheduling functions of the operating system, and running your program repeatedly along different execution paths. Maze does this by simulating random context switches in a controllable and reproducible way. When unexpected program behavior has been detected, Maze knows the exact execution sequence that precedes it.

Recorded at the HPC Advisory Council Stanford Workshop on Dec. 7, 2011. Download the Slides (PDF) or take a peek at the Maze User Manual.

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Software, Video | Leave a comment

Video: The Portland Group Showcases the PGI Accelerator at SC11

In this video, Doug Miles from The Portland Group discusses how the PGI Accelerator, which is designed to help programmers make their code go faster on x64+GPU platforms. Recorded at SC11.

Using PGI Accelerator compilers, programmers can accelerate Linux, Mac OS X and Windows applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing high-level standard-compliant Fortran and C programs and then recompiling with appropriate compiler options.

Read the Full Story.

Also posted in Events, HPC, HPC Software, SC11, Video | Leave a comment

Video: Knights Corner vs. ASCI Red – First to 1 Teraflop

In this video, Intel’s James Reinders talks about how far technology has progressed since 1997 when Intel’s ASCI Red supercomputer broke the Teraflop barrier. The company recently announced that their Knights Corner chip achieved the same performance on a single chip.

Also posted in Accelerators, Compute, Events, HPC Hardware, SC11 | Leave a comment

New Course: Programming GPUs using PGI Accelerator

I heard some good things this week about the PGI Accelerator, which is designed to help mere mortals make their code go faster on x64+GPU platforms. To help get you started, The Portland Group is offering a new 2-day training course on programming GPUs using the PGI Accelerator programming model.

This course will provide attendees with the insights and skills necessary to have them up and running quickly porting their applications to GPUs,” said Douglas Miles, Director of The Portland Group. “nCore brings tremendous expertise, along with a solid track record for providing quality training and professional service.”

The two-day course, “NCT-500 PGI Accelerator Programming,” is available from nCore and is priced at $1,895.00 per student.  For more information, contact info@ncoredesign.com or ncoredesign.com/pgi/ for booking.

Read the Full Story.

Also posted in HPC Education and Training | Leave a comment

Video: Parallel Studio XE 2011 at IDF

In this video, Brandon Hewitt of Intel gives a demonstration of Intel Parallel Studio XE 2011 at the Intel Developer Forum. Brandon walks through vTune Amplifier, and Composer XE.

Also posted in Events, HPC, HPC Software, Video | Leave a comment

Slidecast: Intel Amps Up HPC Development Tools with Parallel Studio XE 2011 Service Pack 1

In this slidecast, Intel’s James Reinders describes how the company is increasing performance, forward scaling, and adherence to standards with the release of Intel Parallel Studio XE 2011 Service Pack 1.

Download the MP3 * Subscribe on iTunes * Subscribe on other podcast players

Also posted in HPC, HPC Software, Podcast, Video | Leave a comment

vfThreaded-x86 – A Cloud-based Tool that Parallelizes Apps for Multicore

Dr. Dobbs writes that Vector Fabrics has recently announced vfThreaded-x86, a cloud-based software tool designed to facilitate the optimization and parallelization of applications for multi-core x86 architectures.

Our parallelization technologies for the Intel architecture make it easy to speed up a program using multiple threads, something programmers often shy away from since they find it difficult to split up code and to avoid hard-to-find bugs. Our tools largely automate this otherwise error-prone and lengthy manual parallelization process,” said Mike Beunder, CEO of Vector Fabrics.

vfThreaded-x86 is accessed through the Vector Fabrics website using a standard web browser — the software development tool runs in the Amazon EC2 cloud. Read the Full Story.

Also posted in Cloud HPC, HPC, Video | Leave a comment

Video: The CMOS Crisis and Continuous Computing

In this video Microsoft’s Doug Burger presents The CMOS Crisis, the Customization Conundrum, and Continuous Computing.

Exponential trends continue until they don’t. The ongoing failure of Dennard scaling will drive enormous changes in our industry and computing ecosystem as Moore’s “Law” grinds to its inexorable end. The shift to multicore was just the proverbial canary; much greater changes lie immediately ahead, including Dark Silicon, a silicon supply glut, and forced specialization at massive scale. Despite these drastic, imminent changes in the semiconductor space, the combination of cloud computing, massive flows of new data, advanced mobile clients, and powerful new networks offers exciting new capabilities … if the hardware scaling trends permit. In this talk, I will first summarize the imminent CMOS Crisis, then describe the oxymoron of general‐purpose specialization (the Customization Conundrum), and finally describe Continuous Computing, a new paradigm for mobile computing backed by the cloud.

A tip of the hat goes to Greg Pfister for pointing us to this story.

Also posted in HPC, HPC Software, Video | Leave a comment

ParaSail Language to Ease Multicore Programming

Multicore is everywhere from mobile devices to the datacenter. Enter ParaSail, a new programming language designed by SofCheck CTO Tucker Taft.

ParaSail uses a number of other tricks, some that draw on languages developed in the late 1980s and early 1990s for supercomputers—machines running many individual computer chips networked together. “The design of the language itself is essentially complete,” says Taft, who presented details of the language on Wednesday at the O’Reilly Open Source Convention. “The first version of the compiler will be released in the next month or so.” The language will work on Windows, Mac, and Linux computers.

It’s always tough to get traction with a new language, but Microsoft and Intel are reportedly putting $20 million into adapting existing languages for multicore processors, so ParaSail will have its work cut out for it. Read the Full Story.

Also posted in HPC, HPC Software | 2 Comments

Microsoft Accelerator System – A Swiss Army Knife for Heterogeneous Programming?

Microsoft’s Satnam Singh writes about the company’s new Accelerator System, which allows certain kinds of data-parallel descriptions to be written once and then executed on three different targets: GPUs, multicore processors using SSE3 vector instructions, and FPGA circuits.

In general we cannot hope to devise one language or system for programming heterogeneous systems that allows us to compile a single source into efficient implementations on wildly different computing elements such as CPUs, GPUs, and FPGAs. Such parallel-performance portability is difficult to achieve. If the problem domain is sufficiently constrained, however, it is possible to achieve good parallel performance from a single source description. Accelerator achieves this by constraining the data types used for parallel programming (to whole arrays that cannot be explicitly indexed) and by providing a restricted set of parallel array access operations (e.g., in order, in reverse, with a stride, shifted, transposed).

Read the Full Story or Download Microsoft Accelerator.

Also posted in HPC, HPC Software | 1 Comment

Advertisement

Penguin Ad

View All Videos

insideHPC.com is a production of insideHPC, LLC. © 2006-2011 Sitemap