Reader Ilya Mirman, now at Cilk Arts, wrote in to tell us a little about some of the work his company is doing and the recognition they’ve recently received for their efforts. Cilk Arts’ goal is “to bring parallelism to the masses,” something the computing industry could frankly use about now. Here’s what Ilya had to say.
An Invitation: Parallelize Serial C/C++ HPC Apps Without Breaking a Sweat
When I first heard that our founders were being awarded a prize for a 1998 paper, I thought there was either a typo, or the email equivalent of those postcards that arrive decades after they were sent. Turns out, there’s no typo — the papers are judged by their influence in the subsequent decade. The award, presented at the 2008 Programming Language Design and Implementation conference, is for the paper The Implementation of the Cilk-5 Multithreading Language.
At Cilk Arts we are really honored by this recognition. The award committee had the following to say about our paper:
‘The 1998 PLDI paper “Implementation of the Cilk-5 Multithreaded Language” by Matteo Frigo, Charles E. Leiserson, and Keith H. Randall introduced an efficient form of thread-local deques to control scheduling of multithreaded programs. This innovation not only opened the way to faster and simpler runtimes for fine-grained parallelism, but also provided a basis for simpler parallel recursive programming techniques that elegantly extend those of sequential programming. The stack-like side of a deque acts just like a standard procedure stack, while the queue side enables breadth-first work-stealing by other threads. The work-stealing techniques introduced in this paper are beginning to influence common practice, such as the Intel Threading Building Blocks project, an upcoming standardized fork-join framework for Java, and a variety of projects at Microsoft.’
The paper is based on a couple ideas that together yield a powerful point of view on parallel programming. More on this can be found on our multicore programming blog, but in a nutshell:
- Creating a new thread does not have to be expensive. From the programmer’s point of view, cheap expression of parallelism eliminates the whole issue of thread granularity that you have in, e.g., Pthreads programs, where you must make sure that the thread that you create performs enough work to amortize the overhead of the thread creation.
- In the execution of a parallel program, one can distinguish between cycles that would be spent on a single-core execution (called the “work term”), and cycles that are spent solely because of parallelism, e.g., spent migrating work from one core to another (called the “critical-path term”). When implementing a parallel programming language, the implementer usually faces tradeoffs between reducing overheads in the work term, or reducing overheads in the critical-path term. In Cilk, we postulate that the work term is more important, and design the system accordingly.
The connection to the HPC world is very tight here. The Cilk Project came out of the Supercomputing Technologies group at MIT. Serial C++ remains the most popular vehicle for prototyping – but deploying those algorithms on parallel computing hardware using MPI can be difficult, expensive, time-consuming, and error-prone. And, we think that for a sizeable fraction of the market, Cilk++ might be attractive, given that folks don’t need to restructure/rewrite their code. (In fact, Cilk won First Prize in the 2006 HPC Challenge, cited as “Best Combination of Elegance and Performance.”)
So here’s an invitation for InsideHPC readers: If you have an app meeting the following criteria, raise your hand for our Early Visibility Program!
- Application performance is compute bound (rather than, say, I/O bound)
- Performance can be improved by accelerating serial (i.e., single-threaded) portions of the application
- Application is targeted to run on a shared-memory system, or relatively “fat” cluster nodes