Craig Lucas from NAG has an article at up Scientific Computing about one of my favorite topics: broadening access. His particular theme is generally that people often come to HPC with working codes and workflows, but today they are faced with difficult challenges transitioning that workflow to large scale computing. How do we (as a community) respond when a user comes in the door with a bundle of Python or Matlab under their arm, ready to run?
I am sure many would be tempted to raise an eyebrow and say you should have learned a more, shall we say, traditional scientific computing language when you were an undergraduate! That debate rages on and on as this recent contribution shows, so we will avoid that here. However, I did recently teach Fortran to some graduate students who had only used MATLAB thus far and had Fortran 77 thrust upon them by their supervisor. The sheer panic in their faces only went to reinforce the simplicity of prototyping that some people find in scripting languages like MATLAB, that they don’t in lower-level languages.
…I am a big believer in the scientist being allowed to do their science, and spending a large amount of their research time learning a new language and then learning a parallel one too doesn’t seem a great use of time, especially as those Ph.D. years tick away. So, the question is: Is that what should they do?
Unfortunately, there isn’t a pat answer. As Craig points out, there are options, but they aren’t great. Scientists can rely on computational specialists to rewrite their apps in Fortran/C with MPI, which gets you performance, but this is at the expense of removing the ability of the scientists to maintain their own codes. What about MATLAB?
The MathWorks have their own products, namely the Parallel Computing Toolbox and, if you want to go off node, you’ll need the Distributed Computing Server.6 This offers some parallelism of both data and tasks, but it is not yet widely used. Maybe one of the free MATLAB clones offers an interesting alternative.
Popular are Scilab7 and Octave,8 and, together with MATLAB itself, they have a wide range of parallel versions, many third party, too many to do justice to here. Perhaps that is the point, there are too many. Some using IO for communication, some toolboxes offering MPI or a subset of it, others offering a way to task farm over many instances of MATLAB or a clone. Many are doing these things very well. But, with the sheer amount and the varying funding the associated projects have, or the support they can offer, I wouldn’t want to hang my hat on anything just yet.
Today, Lucas’ own choice is to bite the bullet and go for an MPI rewrite. But what about the future? He sees hope for Python
So, what about Python? Perhaps this offers a more obvious transition into HPC… [but]…Let’s just say that no one is going to expect the performance of a scripting language to be that of a compiled one. The MathWorks recently talked about this themselves, suggesting a translation to C. I know this is HPC heresy, we are supposed to squeeze out every drop of performance, but maybe we don’t. We have to factor in the time of development, and back to the researcher wanting to research, what if it took weeks rather than months to develop their code? Maybe it is OK to sacrifice a little, or even quite a lot, of performance, if they can solve that bigger problem. And, let’s not pretend that every HPC code out there runs like the HPL benchmark!
That last point won’t sit well with those on the bleeding edge, but remember that we aren’t talking about them here anyway. When a user is moving up from a workstation, even less-than-optimal performance on the HPC system is more than they were getting before, and the tradeoff for that epsilon loss of performance is that bigger problems are being solved — and users are more productive more quickly. Seems worth it to me. Anyway, more in Craig’s article.