Entries filed under “Tools”

News about compilers, debuggers, communications libraries, and the tools of HPC development.

New Course: Programming GPUs using PGI Accelerator

I heard some good things this week about the PGI Accelerator, which is designed to help mere mortals make their code go faster on x64+GPU platforms. To help get you started, The Portland Group is offering a new 2-day training course on programming GPUs using the PGI Accelerator programming model.

This course will provide attendees with the insights and skills necessary to have them up and running quickly porting their applications to GPUs,” said Douglas Miles, Director of The Portland Group. “nCore brings tremendous expertise, along with a solid track record for providing quality training and professional service.”

The two-day course, “NCT-500 PGI Accelerator Programming,” is available from nCore and is priced at $1,895.00 per student.  For more information, contact [email protected] or ncoredesign.com/pgi/ for booking.

Read the Full Story.

Also posted in HPC Education and Training | Leave a comment

Video: Parallel Studio XE 2011 at IDF

In this video, Brandon Hewitt of Intel gives a demonstration of Intel Parallel Studio XE 2011 at the Intel Developer Forum. Brandon walks through vTune Amplifier, and Composer XE.

Also posted in Events, HPC, HPC Software, Video | Leave a comment

Slidecast: Intel Amps Up HPC Development Tools with Parallel Studio XE 2011 Service Pack 1

In this slidecast, Intel’s James Reinders describes how the company is increasing performance, forward scaling, and adherence to standards with the release of Intel Parallel Studio XE 2011 Service Pack 1.

Download the MP3 * Subscribe on iTunes * Subscribe on other podcast players


Also posted in HPC, HPC Software, Podcast, Video | Leave a comment

vfThreaded-x86 – A Cloud-based Tool that Parallelizes Apps for Multicore

Dr. Dobbs writes that Vector Fabrics has recently announced vfThreaded-x86, a cloud-based software tool designed to facilitate the optimization and parallelization of applications for multi-core x86 architectures.

Our parallelization technologies for the Intel architecture make it easy to speed up a program using multiple threads, something programmers often shy away from since they find it difficult to split up code and to avoid hard-to-find bugs. Our tools largely automate this otherwise error-prone and lengthy manual parallelization process,” said Mike Beunder, CEO of Vector Fabrics.

vfThreaded-x86 is accessed through the Vector Fabrics website using a standard web browser — the software development tool runs in the Amazon EC2 cloud. Read the Full Story.

Also posted in Cloud HPC, HPC, Video | Leave a comment

Video: The CMOS Crisis and Continuous Computing

In this video Microsoft’s Doug Burger presents The CMOS Crisis, the Customization Conundrum, and Continuous Computing.

Exponential trends continue until they don’t. The ongoing failure of Dennard scaling will drive enormous changes in our industry and computing ecosystem as Moore’s “Law” grinds to its inexorable end. The shift to multicore was just the proverbial canary; much greater changes lie immediately ahead, including Dark Silicon, a silicon supply glut, and forced specialization at massive scale. Despite these drastic, imminent changes in the semiconductor space, the combination of cloud computing, massive flows of new data, advanced mobile clients, and powerful new networks offers exciting new capabilities … if the hardware scaling trends permit. In this talk, I will first summarize the imminent CMOS Crisis, then describe the oxymoron of general‐purpose specialization (the Customization Conundrum), and finally describe Continuous Computing, a new paradigm for mobile computing backed by the cloud.

A tip of the hat goes to Greg Pfister for pointing us to this story.


Also posted in HPC, HPC Software, Video | Leave a comment

ParaSail Language to Ease Multicore Programming

Multicore is everywhere from mobile devices to the datacenter. Enter ParaSail, a new programming language designed by SofCheck CTO Tucker Taft.

ParaSail uses a number of other tricks, some that draw on languages developed in the late 1980s and early 1990s for supercomputers—machines running many individual computer chips networked together. “The design of the language itself is essentially complete,” says Taft, who presented details of the language on Wednesday at the O’Reilly Open Source Convention. “The first version of the compiler will be released in the next month or so.” The language will work on Windows, Mac, and Linux computers.

It’s always tough to get traction with a new language, but Microsoft and Intel are reportedly putting $20 million into adapting existing languages for multicore processors, so ParaSail will have its work cut out for it. Read the Full Story.

Also posted in HPC, HPC Software | 2 Comments

Microsoft Accelerator System – A Swiss Army Knife for Heterogeneous Programming?

Microsoft’s Satnam Singh writes about the company’s new Accelerator System, which allows certain kinds of data-parallel descriptions to be written once and then executed on three different targets: GPUs, multicore processors using SSE3 vector instructions, and FPGA circuits.

In general we cannot hope to devise one language or system for programming heterogeneous systems that allows us to compile a single source into efficient implementations on wildly different computing elements such as CPUs, GPUs, and FPGAs. Such parallel-performance portability is difficult to achieve. If the problem domain is sufficiently constrained, however, it is possible to achieve good parallel performance from a single source description. Accelerator achieves this by constraining the data types used for parallel programming (to whole arrays that cannot be explicitly indexed) and by providing a restricted set of parallel array access operations (e.g., in order, in reverse, with a stride, shifted, transposed).

Read the Full Story or Download Microsoft Accelerator.

Also posted in HPC, HPC Software | 1 Comment

Parallel Programming: The Path from Multicore to Manycore

Intel Software Evangelist James Reinders writes that the best way to avoid panic in the coming wave of manycore systems is to think parallel.

Parallel programming is easy to understand and utilize when the work to be done is completely independent. It’s the interaction between concurrent tasks of an application that are challenging and therefore require a plan for managing sharing between concurrent tasks. The seemingly most fundamental sharing is simple sharing of data via shared memory, and yet nothing gives rise to more challenges in concurrent programming. All parallel computer designs struggle to offer some relief, varying from simple to exotic solutions, but in all cases the best results come from reduced sharing and the worst from unnecessary and frequent fine-grained sharing of data.

Nothing is more fundamental to parallel programming than understanding both sharing and scaling as well as the general relationship between them. Understanding sharing and how to manage it is the key to parallel programming — less is better.

Read the Full Story.

Also posted in Compute, HPC, HPC Hardware | Leave a comment

Erlang’s Parallelism is not Parallelism!

In this extensive post, Computer Scientist and blogger JLouis goes the extra mile to dissect the Erlang programming language, urging the reader to go in with the realization that concurrency and parallelism are different beasts:

Note however, while Erlang is not a parallel language, its runtime is rather excellent at forcing out parallelism of existing concurrent programs. So when we say Erlang is parallel, we say that Erlang is parallel in a specific way! The recent years have seen much work in Erlang/OTP on making the runtime concurrently parallel and we are reaping the benefits. The reason can be found in the simple observation that a Erlang program has thousands of processes which gives thousands of executable threads of control. Since you have more than one thread of control, and communication between them is largely asynchronous, you have all the opportunity for a parallel speedup.

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

Intel’s New High-Performance SPMD Compiler Provides 3X Speedup

The GPU Science blog has a post on Intel’s new high performance SPMD Compiler:

ispc is a new compiler for “single program, multiple data” (SPMD) programs. Under the SPMD model, the programmer writes a program that mostly appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs; it frequently provides a 3x or more speedup on CPUs with 4-wide SSE units, without any of the difficulty of writing intrinsics code.

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

AMD Introduces New Software Development Tools

AMD has announced a new set of software development tools and solutions to optimize their applications for OpenCL standards, allowing developers to create systems more quickly based on AMD’s new Fusion Family of Accelerated Processing Units (APUs).

“AMD is working closely with the developer community to make it easier to bring the benefits of heterogeneous computing to consumers, enabling next-generation system features like vivid video, supercomputer-like performance and enhanced battery life,” said Manju Hegde, corporate vice president, AMD Fusion Experience Program. “Our advanced developer tools and solutions enable a new era of parallel programming that’s based on industry standards and focused on delivering innovative user experiences that span a variety of computing form factors.”

Read the Full Story

 

 

Also posted in HPC, HPC Software | Tagged AMD | Leave a comment

CUDA Comes to X86 Thanks to Portland Group

Today the Portland Group announced that it is now shipping the PGI CUDA C and C++ compilers for systems based on the industry standard general-purpose 64-bit and 32-bit x86 architectures.

“With the addition of PGI CUDA C and C++ for x86, PGI further extends its comprehensive suite of tools for programming GPUs,” said Douglas Miles, director, The Portland Group. “It’s another important element in our ongoing strategy of providing HPC programmers with a full range of options for optimizing compute-intensive applications and leveraging the latest technical innovations from AMD, Intel and NVIDIA.”

When run on x86-based systems, PGI CUDA C/C++ applications perform parallel execution by using the multiple processor cores, and by using Streaming SIMD (Single Instruction Multiple Data) Extensions (SSE), including the new AVX instructions available on the latest generation of x86 compatible CPUs from Intel and AMD.

Read the Full Story.

Also posted in HPC Software | Leave a comment

Using Intel AVX without Writing AVX

 

Designed for floating-point intensive applications, the Intel Advanced Vector Extensions (AVX) are a new 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel SSE). Released to support the 2nd generation Intel Core processor family, AVX improves performance due to wider 256-bit vectors, a new extensible instruction format (Vector Extension or VEX), and by its rich functionality.

“This paper discusses options that developers can choose from to integrate Intel® AVX into their application without explicitly coding in low level assembly language. The most direct way that a C/C++ developer can access the features of Intel® AVX is to use the C-compatible instrinsic instructions. The intrinsic functions provide access to the Intel® AVX instruction set and to higher-level math functions in the Intel® Short Vector Math Library (SVML). These functions are declared in the immintrin.h and ia32intrin.h header files respectively. There are several other ways that an application programmer can utilize Intel® AVX without explicitly adding Intel® AVX instructions to their source code. This document presents a survey of these methods using the Intel® C++ Composer XE 2011 targeting execution on a Sandy Bridge system.

Download the Full Article (PDF).

Also posted in HPC, HPC Software, National HPCC Conference | Leave a comment

Called “One of the Best,” EKOPath 4 Compiler Goes Open Source

This week PathScale announced that the EKOPath 4 Compiler Suite is now available as an open source project and free download for Linux, FreeBSD and Solaris. Called “some of the best C/C++/Fortran compilers on the market” by Scalable Informatics CEO Joe Landman, the release includes documentation and the complete development stack, including compiler, debugger, assembler, runtimes and standard libraries. EKOPath is the product of years of ongoing development, representing one of the industries highest performance Intel 64 and AMD C, C++ and Fortran compilers.

“This is an exciting announcement because it allows PathScale’s world-class compilers to be accessible to a much larger community of compiler users. The enlarged user base will benefit PathScale, and in turn promote more community participation in the continued development of PathScale’s products.” says Fred Chow, PathScale Chief Scientist.

PathScale will continue to provide full paid support for EKOPath and offer additional support options for AMD’s Open64 compiler. Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

Parallel programming: How to choose the best task-size?

Aater Suleman at the Future Chips blog looks at how to choose the best task size at run-time for parallel programming. He analyzes the trade-offs and explains some recent advances in work-queues that minimize their overhead.

“To maximize concurrency, all threads should be programmed to complete their work at the same time. Balancing the load among threads requires the programmers to predict the latency of each task, which is often impossible due to unpredictable OS/hardware effects. Consequently, programmers split the work into small tasks and use work-queues to distribute the work dynamically. Work-queues exist in many programming paradigms like Grand Central Dispatch, Intel TBB, Cilk, Open MP, etc. While work-queues improve load-balancing, they introduce the overhead of adding and removing tasks to and from the work-queue. Thus, if each individual task is too small, the work-queue overhead becomes prohibitive and if its too long then there is risk of imbalance.”

Read the Full Story.

Also posted in HPC, HPC Software | Leave a comment

Advertisement

Nvidia Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap