Douglas Eadline posted a bit of an introspective piece via his HPC column at Linux Magazine. Put simply, he lays out what may become a growing chasm between effective multiprocessor programming paradigms and those designed for large-scale [greater than 32 core] muticore systems. If you follow the good Doctor’s articles via ClusterMonkey and Linux Mag, you’ll know that he’s written quite a few overviews of basic parallel programming methods. He’s written and/or collaborated on numerous parallel programming projects over the years, so he can certainly walk the walk. Aside from experience, I can personally attest to the fact that he’s a sharp guy. On the subject of multicore programming, I full heartedly agree with him.
Borrowing from Doug’s article, “writing good software is hard. Period. Writing good parallel software is harder still, but not impossible. Understanding the basics is essential in either case.” I’m a classically educated software engineer, born again mechanical engineer. What I see happening in the software corner of our beloved HPC industry is somewhat frightening. We continue to spend an excessive amount of time developing and scaling multiprocessor programming methodologies [such as MPI] in order to scale the upper crust of computational workloads to higher realms of existence. Interconnect technologies will continue to increase in complexity and performance with ongoing system development. This certainly warrants new ways of thinking about message passing and super-scale software development. However, what about the little guy?
Case in point. I attribute my love of HPC to a wild-haired PhD mechanical engineer I call ‘Dad.” I remember the gleam in his eye when I substituted his RISC 6000 user manuals for Scooby Doo cartoons. He continues to operate that system to the best of his abilities. One can only wonder how he runs three dimensional shock physics codes on a workstation that is no more powerful than an iPhone. Good news! The company Dogberts have announced that he will finally receive an upgrade next year. At which point, my poor impressionable father will be forced to not only upgrade his hardware, but his entire software infrastructure. At which point, I’ll receive a phone call asking what the lasted and greatest “engineering” software paradigms are. Are there any?
This is where the vendor audience begins to throw their product pitches and tomatos. One can certainly argue useful multicore programming paradigms live in Matlab, Cilk++, OpenMP, pthreads and a host of other up and comers. However, are any of these solutions more effective than their predecessors? Are any of these any more effective than MPI? We are in a multicore world. We shall soon embark on a massively multicore era for which there exists no effective solution for the large mass of users that have no use for scale beyond their desk. This, I call, The Eadline Split.
Please feel free to comment on this subject. Unlike many of our other articles, I’ve included quite a bit of my own personal opinion and experience. Before you comment, I highly suggest you read Dr. Eadline’s article at Linux Mag here.
Ummm….sounds like a bit of a rant…
1) If the prerequisite for solving the multicore programming challenge is learning MPI – wow, the software field is screwed. Good luck getting the 99.9% of engineers who don’t know MPI to learn it. And by the way – is that low-level protocol, arguably a necessary evil for distributed systems, really a must for shared memory boxes, which are outnumber clusters 100:1? For every million HPC systems out there, there’s hundreds of millions of desktop multicore boxes getting shipped…
2) Not sure how helpful it is to lump native threads, MATLAB, OpenMP, Cilk++ et al together. This is apples and oranges, hammers and wrenches. If I’m a domain expert comfortable with MATLAB, I am probably not going to port my code to native threads; if I’m a software vendor who cares about performance, I’m probably not writing my app in MATLAB; etc.
3) I must admit, I am slightly perplexed at the inclusion of Cilk++, a product we first shipped a couple months ago, among the other approaches that are one or two decades old, in the “are these really any better” question. Did hundreds of schools download the MPI toolkit within the first month of it being available? They did for Cilk++. Are engineers able to write their first multicore program within an hour of buying a book on Pthreads? They can with Cilk++.
Sorry about the rant 🙂
ilya
You’re lumping apples and oranges into the same bin. Is there a difference between programming in assembly language and a high-level language such as C++, Java, or Python? That’s the difference between most of the concurrency platforms you mention and Cilk++. Even OpenMP, which is ostensibly high level, doesn’t offer composable parallel performance the way that Cilk++ does. Have you actually used any of the concurrency platforms you mention?
Charles/Ilya, I’ve indeed used all of the concurrency platforms I mention in the article [and quite a few not mentioned]. Speaking from experience, Cilk is definitely a step in the right direction. I can’t, however, say that it is ‘the’ answer.
I deliberately mentioned it alongside other constructs due to its age. Cilk [as marketed/supported by CilkArts] is still relatively new as compared to POSIX threads, MPI, UPC and other constructs. I’m interested to see more general application use and general adoption of Cilk.
John,
I don’t have a comment, but rather a question. What do you think of what Apple is doing with Grand Central Dispatch in Snow Leopard? Thanks…
Kevin
Kevin, Grand Central Dispatch definitely looks like an interesting methodology. It has the musings of being efficient from the development side as the thread parallelism is natively hidden from the user. However, I haven’t read/seen enough of its actual implementation to comment on whether I think it would be a good fit for HPC/scientific apps.
One of the specific features I’m curious about is individual task affinity. Given a NUMA SMP [eg, Intel Nehalem-EX], can one natively control where the threads land on the system in relation to allocated memory blocks? [so as not to stall the memory units within each socket searching for blocks].
What Eadline says is correct and does make sense. I believe though that the direction for HPC will come not from the tech apps available now, but by the commercial generated by Joe Blogs for applications we have not even conceived yet. We are at the start of a new era of architectures and programming paradigms. Number crunching is a nice showpiece for performance, but will not satisfy many business models. Everyone can see the 2nd computer revolution is starting, none of us know where it is heading though. Intel now at 4 cores per chip, AMD at 6, as of this writing.
A great infroamtion about the Multicore Programming .i also finding for this type of the programming blog so its very useful for me so thanks to post.