Scripting to scaling: enabling broader access to HPC

Craig Lucas from NAG has an article at up Scientific Computing about one of my favorite topics: broadening access. His particular theme is generally that people often come to HPC with working codes and workflows, but today they are faced with difficult challenges transitioning that workflow to large scale computing. How do we (as a community) respond when a user comes in the door with a bundle of Python or Matlab under their arm, ready to run?

I am sure many would be tempted to raise an eyebrow and say you should have learned a more, shall we say, traditional scientific computing language when you were an undergraduate! That debate rages on and on as this recent contribution shows, so we will avoid that here. However, I did recently teach Fortran to some graduate students who had only used MATLAB thus far and had Fortran 77 thrust upon them by their supervisor. The sheer panic in their faces only went to reinforce the simplicity of prototyping that some people find in scripting languages like MATLAB, that they don’t in lower-level languages.

…I am a big believer in the scientist being allowed to do their science, and spending a large amount of their research time learning a new language and then learning a parallel one too doesn’t seem a great use of time, especially as those Ph.D. years tick away. So, the question is: Is that what should they do?

Unfortunately, there isn’t a pat answer. As Craig points out, there are options, but they aren’t great. Scientists can rely on computational specialists to rewrite their apps in Fortran/C with MPI, which gets you performance, but this is at the expense of removing the ability of the scientists to maintain their own codes. What about MATLAB?

The MathWorks have their own products, namely the Parallel Computing Toolbox and, if you want to go off node, you’ll need the Distributed Computing Server.6 This offers some parallelism of both data and tasks, but it is not yet widely used. Maybe one of the free MATLAB clones offers an interesting alternative.

Popular are Scilab7 and Octave,8 and, together with MATLAB itself, they have a wide range of parallel versions, many third party, too many to do justice to here. Perhaps that is the point, there are too many. Some using IO for communication, some toolboxes offering MPI or a subset of it, others offering a way to task farm over many instances of MATLAB or a clone. Many are doing these things very well. But, with the sheer amount and the varying funding the associated projects have, or the support they can offer, I wouldn’t want to hang my hat on anything just yet.

Today, Lucas’ own choice is to bite the bullet and go for an MPI rewrite. But what about the future? He sees hope for Python

So, what about Python? Perhaps this offers a more obvious transition into HPC… [but]…Let’s just say that no one is going to expect the performance of a scripting language to be that of a compiled one. The MathWorks recently talked about this themselves, suggesting a translation to C. I know this is HPC heresy, we are supposed to squeeze out every drop of performance, but maybe we don’t. We have to factor in the time of development, and back to the researcher wanting to research, what if it took weeks rather than months to develop their code? Maybe it is OK to sacrifice a little, or even quite a lot, of performance, if they can solve that bigger problem. And, let’s not pretend that every HPC code out there runs like the HPL benchmark!

That last point won’t sit well with those on the bleeding edge, but remember that we aren’t talking about them here anyway. When a user is moving up from a workstation, even less-than-optimal performance on the HPC system is more than they were getting before, and the tradeoff for that epsilon loss of performance is that bigger problems are being solved — and users are more productive more quickly. Seems worth it to me. Anyway, more in Craig’s article.

Trackbacks

  1. […] Scripting to scaling: enabling broader access to HPC […]

Comments

  1. Silvina Grad-Freilich, MathWorks says

    Great to hear your thoughts on this concept and Craig’s analysis, John. Since MATLAB is directly addressed, I thought this would be a good forum to provide our insight on the topic.

    Very few engineers and scientists would consider themselves HPC experts. Most are just learning what it means to have multiple cores in their computer or access to GPUs that can do things other than graphics processing. These same people also have day jobs that require them to get their work done using the tools they have available to them. This is where MATLAB and other solutions come in.

    In order to successfully leverage these tools, engineers and scientists need an easy on ramp. They don’t want to be required to have to know MPI just to be able to leverage the hardware available to them. They also don’t want to change their existing code base and they need to think about things like sharing their code with others who may not have the same hardware. MathWorks has been working to provide multiple levels of abstraction with our parallel computing capabilities so that users can leverage parallelism at the level that is appropriate for them; ranging from none to very few coding changes to low level programming of MPI. For us, a key design principle is to ensure that the ability to parallelize work does not get in the way of existing code. This allows everyday engineers and scientists to ease into parallel programming. Also, a software environment needs to separate the algorithms from the backend hardware so that code is portable and maintainable. MathWorks parallel computing solution has done just this by enabling the same code to run in multiple environments without any code changes.

    We’ve seen fantastic response to our parallel computing tools because they lower the barriers to entry; they make parallel and cluster computing accessible to everyday technical professionals who are focused on the business of engineering, science, and other flavors of technical computing.