Entries filed under “HPC Software”

News relating to end-user HPC application codes, both from ISVs and non-commercial developers.

Whamcloud’s Gorda: Lustre has Great Momentum

Whamcloud’s Brent Gorda writes that the Lustre parallel distributed file system has great momentum coming into 2012.

This year will see major new systems coming online at unprecedented scale and performance. Due to personal involvement over the years, I am especially excited to see Sequoia come online. It’s a 20 petaFLOPS (peak) system based on IBM BlueGene/Q technology at Lawrence Livermore National Laboratory (LLNL). At Oak Ridge National Laboratory (ORNL), they are upgrading Jaguar with GPGPUs to 9x its current performance to the same 20 petaFLOPS (peak). And Blue Waters at the University of Illinois, in the news due to the switch from IBM to Cray, will come online with a performance of over 11 petaFLOPS (peak).

Guess what core file system technology provides the foundation for these massive systems? In all three cases, it’s Lustre.

Read the Full Story. In related news, the Lustre community will gather in Austin for LUG 2012 on April 23-25.

Also posted in Events, HPC, LUG 2012 | Leave a comment

Intel’s Future Haswell Processor to Feature Transactional Synchronization

Intel’s James Reinders writes that the company will be introducing new Transactional Synchronization Extensions (TSX) for the future 22 nm multicore processor code-named “Haswell”. In a nutshell, Intel TSX provides a set of instruction set extensions that allow programmers to specify regions of code for transactional synchronization.

With transactional synchronization, the hardware can determine dynamically whether threads need to serialize through lock-protected critical sections, and perform serialization only when required. This lets the processor expose and exploit concurrency that would otherwise be hidden due to dynamically unnecessary synchronization.

Read the Full Story or download the updated specifications.

Also posted in Compute, HPC, HPC Hardware | Leave a comment

Slidecast: Solarflare ApplicationOnload Engine for On-the-Fly Processing of Network Data

In this slidecast, Mike Smith from Solarflare describes the company’s ApplicationOnload Engine (AOE), a new platform that moves application processing into the network adapter for applications that rely on real-time, high-performance network data.

Our new ApplicationOnload Engine is a new class of product that results directly from interaction with our end-user customers. Our engineers have worked closely with these customers to create a platform that leverages OpenOnload’s proven framework for creating a direct path from applications to the network, and incorporates on-the-fly processing of real-time network data,” said Russell Stern, CEO at Solarflare. “This solution provides not only the lowest latency and highest message rate network I/O performance, but achieves an unparalleled boost in application performance, all while maintaining a seamless, compatible interface with our existing server adapter products.”

Solarflare’s AOE combines a fully featured 10GbE server adapter with a state-of-the-art FPGA that provides a seamless, low-latency network interface to the host server and application processing. According to Smith, AOE is an open platform that utilizes applications developed by Solarflare, its customers, and third-party developers.

Read the Full Story * Download the MP3 * Subscribe on iTunes * If Dropbox is blocked, download from this Google page.

Also posted in HPC, HPC Hardware, Network, Video | Leave a comment

Altair Aims to Ease Simulation With PBS Pro 11.2

This week Altair announced released a new Compute Manager and PBS Desktop applications. Designed to streamline engineering workflow within an enterprise, the new software allows engineers submit jobs through a Web-based interface, manage workloads, and immediately review and download the results.

The release of Compute Manager and PBS Desktop marks the beginning of the next level of efficiency and ease for engineers engaged in high-performance computing for everything from crash analysis to animation and weather prediction,” said Mahalingam. “Simulations originate on many types of devices these days, and Altair’s high-performance computing tools focus on helping engineers use the resources at their fingertips in a very user-centric way. We are making the process of managing simulation projects more intuitive, more natural, and more efficient.”

With this new release, can use the enhanced graphical interface in PBS Pro 11.2 scales submit jobs on large clusters and obtain maximum value from their computing infrastructure. Read the Full Story.

Also posted in HPC, Rock Stars of HPC, System Management, Tools | Leave a comment

Allinea Adds Sparklines, Cuda 4.1 Toolkit Support to DDT 3.1 Parallel Debugger

This week Allinea rolled out its DDT 3.1 parallel debugger with a number of enhancements including Sparklines and support for the Cuda 4.1 Toolkit.

Our vision is to provide tools for software developers to take advantage of the parallelism present in todays systems, from desktop GPU and multi-core machines through to the largest systems in the world,” said Dr. David Lecomber, CTO of Allinea Software, “This latest release of Allinea DDT adds some truly innovative features – such as sparklines for viewing data across processes, instantly, which builds on our existing smart highlighting of data values. Adding static analysis into the debugger is also a leap forward for users – static analysis hints at parts of the source code that are incorrect and DDT will highlight this whilst you debug.”

Read the Full Story.

Also posted in Tools | Leave a comment

Interview: Nvidia Updates Cuda Platform to 4.1

This week Nvidia announced the latest update to their Cuda platform for parallel computing. To learn more, I caught up with Will Ramey, Nvidia’s Sr. Product Manager for GPU Computing.

insideHPC: When we talk about a new Cuda platform, are we talking about the Cuda Toolkit plus SDK? Does this new update have a version number?

Will Ramey: Yes, this release is a new version of the CUDA Toolkit and SDK code samples, as well as updated drivers.  The version number for this release is 4.1

insideHPC: Specifically, what components comprise the platform?

Will Ramey: There are 3 key components to this release (version 4.1):

  1. The CUDA Toolkit is a comprehensive development environment for C and C++ developers building GPU-accelerated applications.  Version 4.1 of CUDA Toolkit includes a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing application performance.  You’ll also find programming guides, user manuals, API reference, and other documentation to help programmers add GPU acceleration to their applications quickly.  More info at: http://developer.nvidia.com/cuda-toolkit
  2. The CUDA Driver provides a system-level interface for CUDA applications to communicate with the GPUs, and is included in the NVIDIA drivers installer.
  3. NVIDIA also provides an SDK with over 100 GPU Computing SDK code samples, as well as white papers to help developers quickly add GPU acceleration to their applications.  More info at: http://developer.nvidia.com/gpu-computing-sdk

Developers need to install the CUDA Toolkit to build CUDA applications, and the latest NVIDIA drivers so their applications can communicate with the GPUs in their system.  Developers can also choose to install the SDK code samples to learn from the large collection of examples.

To run CUDA applications, end-users only need to install the latest NVIDIA drivers.

insideHPC: What is new within the updated platform?

Will Ramey: In addition to the new LLVM-based compiler that delivers up to 10 percent faster performance, there are a number of significant new features in this release:

  • New & Improved “drop-in” acceleration with GPU-Accelerated Libraries
    • Over 1000 new image processing functions in the NPP library
    • New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
    • New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213) RNG algorithms
    • Bessel functions now supported in the CUDA standard Math library
    • Up to 2x faster sparse matrix vector multiply using ELL hybrid format
  • Enhanced & Redesigned Developer Tools
    • Redesigned Visual Profiler with automated performance analysis and expert guidance system
    • CUDA_GDB support for multi-context debugging and assert() in device code
    • CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
    • Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire CUDA warp
    • Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls and instruction throughput

  • Advanced Programming Features
    • Access to 3D surfaces and cube maps from device code
    • Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size restrictions removed
    • Peer-to-peer communication between processes
    • Support for resetting a GPU without rebooting the system in nvidia-smi
  • New & Improved SDK Code Samples
    • simpleP2P sample now supports peer-to-peer communication with any Fermi GPU
    • New grabcutNPP sample demonstrates interactive foreground extraction using iterated graph cuts
    • New samples showing how to implement the Horn-Schunck Method for optical flow, perform volume filtering, and read cube map texture

insideHPC: How do the new components ease code development?

Will Ramey: The new LLVM-based compiler compiles code faster than the old compiler, increasing developer productivity.  As you might expect, the compile-time saved varies by application, but we’ve seen some large applications compile more than 60 minutes faster than with the old compiler.

The NVIDIA Visual Profiler has been completely re-designed to streamline developers’ performance analysis workflow.  The new automated performance analysis feature quickly identifies bottlenecks and opportunities to improve application performance, and is integrated with the “Best Practices” documentation guiding developers through the process of optimizing their applications.  Developers can now achieve the full potential of GPU acceleration in their application with significantly less effort.

The new image & signal processing functions in NPP makes it easier for more developers to accelerate more of their algorithms on the GPU.

The new tri-diagonal solver in cuSPARSE allows developers to just call the pre-optimized version in the library instead of having to write their own.

insideHPC: How do the new components help speed developer code?

Will Ramey: The new LLVM-based compiler includes several new optimization techniques that allow the compiler to generate more efficient code.  This is another case where the performance improvement will vary depending on the application, but we’re seeing up to 10 percent performance improvement across a variety of applications.

Using the new RNGs in cuRAND, image & signal processing functions in NPP, tri-diagonal solver in cuSPARSE, etc. all help developers quickly take advantage of pre-optimized routines that take full advantage of hundreds of cores on the GPU.

insideHPC: If I had the most current version of Cuda yesterday, what’s new that I can download today?

Will Ramey: Today you can download the new CUDA Toolkit, SDK code samples, and drivers.  Available for Linux, MacOS and Windows.

 

Also posted in GPUs, HPC, Tools | Leave a comment

Podcast: Turning Up Performance Profiling with Intel VTune Amplifier XE

In this Intel Chip Chat podcast, Allyson Klein and Ramesh Peri discuss developments and benefits of Intel Vtune Amplifier XE, a performance analysis tool for checking app performance on Intel processors. Download the MP3.

Also posted in HPC, Podcast, Tools | Leave a comment

Slidecast: APAX Application Acceleration from Samplify

In this slidecast, Allan Evans and Al Wegener present: APAX Application Acceleration. Samplify Systems is leveraging their advanced compression algorithms to reduce the amount of data that needs to be moved and stored in high performance computing. Read the Full Story.

Download the MP3 * Subscribe on iTunes * Subscribe on other podcast players. If your IT Crowd blocks Dropbox, you can download the audio from this Google page.

Also posted in HPC, Podcast, Video | Leave a comment

Microsoft Raises ‘State of the Art’ Son of NTFS

By Gavin Clarke • Get more from this author

Microsoft has unveiled a “state of the art” file system for the next 10 years that builds on NTFS.

Named Resilient File System (ReFS), Microsoft’s latest baby will be delivered with Windows 8 Server and become the foundation of storage on Windows Clients.

ReFS will be used with Windows 8′s Storage Spaces, a feature in Microsoft’s forthcoming Windows 8 Client that pools storage for use by different machines. Storage Spaces and ReFS have been designed to complement each other as components of a “complete storage system”.

“We believe this significantly advances our state of the art for storage,” Windows storage and file system development manager Surendra Verma wrote Monday on the Building Windows 8 blog. Verma wrote:

We will implement ReFS in a staged evolution of the feature: first as a storage system for Windows Server, then as storage for clients, and then ultimately as a boot volume. This is the same approach we have used with new file systems in the past.

NTFS was introduced by Microsoft in Windows NT in 1993 and has penetrated deep into computing. Verma and his boss, Windows group president Steven Sinofsky, stressed that ReFS does not replace NTFS and that it builds on the existing system. ReFS reuses NTFS code responsible for the Windows file system semantics, Verma said.

“This code implements the file system interface (read, write, open, close, change notification, etc), maintains in-memory file and volume state, enforces security, and maintains memory caching and synchronization for file data. This reuse ensures a high degree of compatibility with the features of NTFS that we’re carrying forward,” he wrote.

The difference between ReFS and NTFS is that the code uses a new engine to implement on-disk structures, such as the Master File Table, to represent files and directories. It’s this machinery, Verma wrote, “where a significant portion of the innovation behind ReFS lies”.

By working with Storage Spaces, ReFS tries to protect data from partial and complete disk failures, and will remove data from the name space on a live volume where information has been corrupted. Meanwhile a process has been added that periodically scrubs metadata and Integrity Stream data on volumes living on a mirrored Storage Space.

The initial focus of ReFS will be on its role in file servers, especially with mirrored Storage Spaces. “We also plan to work with our storage partners to integrate it with their storage solutions,” Verma wrote.

The overall thinking of ReFS seems to be data and file management and a recovery system built from the ground up for peers and nodes of all sizes while handling increasing quantities of big data. NTFS dates from a time when departmental-level and LAN-levels of scale inside the corporate firewall were the goal. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

 

 

Also posted in Storage | 2 Comments

Video: Advanced Cluster Systems Parallelizes Code with Supercomputing Engine Technology

In this video, Dr. Dean Dauger from Advanced Cluster Systems describes how the company’s Supercomputing Engine Technology (SET) parallelizes code.

Today, the only way for software companies and small and medium enterprises to substantially increase the speed and performance of their products is by parallelizing software codes — an extremely expensive undertaking. Advanced Cluster Systems is a software company that specializes in a solution, called SET, to quickly and easily parallelize software codes at a fraction of the cost.

Read the Full Story.

Also posted in Video | Leave a comment

Agile Project Management Reaps On-Time Code Delivery at Whamcloud

Whamcloud’s Jessica Popp writes that Agile project management is just the right fit for HPC projects like Lustre.

The beauty of using Agile for large development projects is that we are splitting our major work into small definable chunks that allow us to see if we veer off the path before it is too late. It’s so important to me I’ll say it again. We can see if we veer off before it is too late. The visibility into the project that GreenHopper and Jira give us allows us to make course corrections after each sprint.

Read the Full Story.

Also posted in HPC | Leave a comment

Video: Concurrency and Parallelism at C++ and Beyond 2011

In this video from the C++ and Beyond 2011 conference, Andrei Alexandrescu, Scott Meyers and Herb Sutter discuss concurrency and parallelism.

A Tip of the Hat goes to GPU Science for pointing us to this video.

Also posted in Video | Leave a comment

Interview: LexisNexis Goes Open Source for Big Data with HPCC

In this video, Charles Kaminski, Jr. from LexisNexis describes how the open source HPCC software environment takes on Big Data analytics in an enterprise ready way that takes much less effort to program than traditional Hadoop solutions.

This video was recorded at the HPC Advisory Council Stanford Workshop in December, 2011.

As mentioned in the video, you can download the HPCC VM and get started using the software and participate in the developer community.

Also posted in Events, HPC, HPC Advisory Council Workshop, inside-BigData, Video | Leave a comment

4 Day CUDA Course in Seattle, Jan 24-27

Acceleware, partnering with NVIDIA and Microsoft, are offering a four-day course designed for programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.

Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.

Register before January 13 and receive $250 off your course fee! Enter promotional code: AXTEB2012

Also posted in Compute, HPC, HPC Education and Training, HPC Hardware | Leave a comment

AMD’s OpenCL – The Good, the Bad, and the Ugly

Matt Soos writes that the speed advantages that developers enjoy from AMD’s OpenCL comes at the expense of an implementation that is rather painful to deal with.

The OpenCL architecture is clean, usable and user-friendly. Unfortunately, the current AMD OpenCL implementation is bad. I hope this will improve in the future, and as more and more people start using it, the bugs will eventually be reported, tiraged, and fixed. As the number of people that buy a particular card for its OpenCL capabilities increase, for instance because games start using OpenCL as a way to speed up certain physics calculations, the makers of these cards will eventually be forced to fix the bugs and make the development easier. Until then, we might have to keep on suffering for the speedups we get.

Read the Full Story.

Also posted in HPC | Leave a comment

View All Videos

insideHPC.com is a production of insideHPC, LLC. © 2006-2011 Sitemap