Interview: MATLAB Support for GPUs a Game-Changer

Print Friendly, PDF & Email

When Nvidia kicked off the GPU conference a couple of weeks ago, one of the first things they announced was that MATLAB now supports GPUs. This is a key milestone, as it essentially means that anyone who knows math can now speed up their calculations with parallel processing on GPUs.

I sat down with Loren Dean, Director of Engineering at Mathworks to talk about MATLAB and what GPU support means for the user base of this popular software.

insideHPC: So what did Mathworks announce at the GPU conference?

Loren Dean: We announced that our users can now take advantage of NVIDIA GPUs from within the MATLAB environment. We’ve added the GPU support to our Parallel Computing Toolbox, and that lets you take advantage of GPUs without low-level C or Fortran programming.

We’ve also added GPU support to our Distributed Computing Server. That product is for scaling up. So if you want to move off of the desktop and onto a cluster, a grid, or the cloud, Distributed Computing Server does that. It requires Parallel Computing Toolbox to scale to the server, but it is just a matter of saying, instead of using my local resources, I want to run on the remote resources.

insideHPC: How will this GPU support help your user base?

Loren Dean: What we’ve really tried to pay attention to is enabling productivity. Our user base cares a lot about productivity and being able to interact with their data and with their environment as well.

If you look at what happens traditionally in the HPC community, it’s all about batch. So you submit something and a couple of days later you get your results back. I was looking recently at a project we have going with Cornell right now. They have a experimental resources on the Teragrid for MATLAB. It’s fully loaded with 512 cores and I think it’s running with a 2-3 day queue time to get your stuff in.

So what we’re trying to do with our products is bring the interactive world to this space. When we look at what typical users want, essentially it’s about interactive use of a cluster. They start out on the desktop and then move to cluster and eventually they may move to batch. But interactivity is key. What we’ve done with our GPU offering is we’ve extended the capabilities so they can interact with a GPU seamlessly, much the same way they do with our Parallel Computing Toolbox. With very few code changes, the users simply has to define which data is going to run on the GPU. So you create an array in MATLAB and say, OK, I’m going to run my FFT or whatever it is on the GPU, and MATLAB just does it.

insideHPC: So you don’t have to manage memory and get explicit about that kind of stuff?

Loren Dean: They can, but they don’t have to. A lot of our typical users, they hear about GPUs, they hear about speedups, and they want to try it out. So we’ve made it really easy for them to get access to it so they can hopefully enjoy the benefit of what GPUs offer.

insideHPC: Is this GPU support a new product or an extension of what you already offered?

Loren Dean: These are just additional capabilities for the two products we already have. So if you’re already a licensed user of Parallel Computing Toolbox or the Distributed Computing Server, it’s an additional capability that’s already in there and you’re ready to scale.

So if you want to go from doing something on the desktop talking to one GPU, I can show you how to do it with four GPUs or on the cluster and there is no code change. That’s one of the things getting back to what we have designed our products for; we care a lot about the engineer who doesn’t want to get into the details here. They’re the person who says, I want to get my work done and I know how to program MATLAB. So we spend a lot of time separating the algorithms from the infrastructure.

The model I typically use to describe this is that of a printer. If you think about printers 20 years ago, you had to know how to program Ghostscript and if you had a bug in the printer driver, you could actually go into the file and change it. You got down to that level if you had to. But now you just find a printer on the network, the device drive is installed, and if you send it to a color printer, it prints in color. It just does it.

That’s really the model we’re trying to follow, we have this idea that you have pre-defined configurations to submit work to. So with the Parallel Computing Toolbox, you have the local configuration that gives you local workers. And then when you want to scale to something like Windows HPC Server, you can set up a configuration for that. So, what we’ve done from a user perspective is that, once the IT administrators set up a config, the user doesn’t care. They just say, I want the work to go that particular resource and the code just works.

insideHPC: Is this GPU parallel programming capability shipping or downloadable today?

Loren Dean: It’s available to anyone with a license to our standard software. The release came out around September 2, so it’s been out for a few weeks. We just haven’t talked about it until this show.

insideHPC: There are a lot of GPUs out there. What do you have on your site that would help somebody see how easy this is and get themselves started?

Loren Dean: The primary place we’d like to send people to is It’s got videos, benchmarking examples, and documentation on using GPUs.

insideHPC: Is this the first time MATLAB has supported an architecture other than x86?

Loren Dean: It’s the first time in recent memory. Actually there were really old versions of MATLAB that ran on the Cray vector architecture.

insideHPC: So GPU accelerators have been around for a long time now. Was the market there not compelling enough to do the port until now?

Loren Dean: So in the GTC opening keynote, Jen-Hsun Huang asked how many people in the room used MATLAB, and all of the hands went up. So we’ve known about the interest.

The three primary things have held us back from providing GPU support. These are all really important from the MATLAB user perspective. First is double precision. MATLAB’s base data type is double, so by default data is double precision. So having a single-precision GPU was not going to appeal to our user base because it would mean changing their code and getting different answers. So double-precision was critical to us.

The second thing is IEEE compliance. Our reputation is based very strongly on getting correct answers. Historically, for GP-GPUs doing graphics rendering, if you were off by a little bit, ok, nobody is going to notice that. But in technical computing, it’s really important and the earlier versions of the libraries were not IEEE compliant.

The final thing is cross-platform support: Windows and Linux. We need to support all those platforms for the MATLAB user base.

So those three things have all come together within the past couple of months for GPUs. And while we’ve been really interested and we’ve known that there’s been demand, we didn’t want to put something out there that we couldn’t stand behind. We want to be confident that we’re getting the right answer and providing something to our broad customer base which is using double precision.

indsideHPC: Does your product work with anything other than Nvidia GPUs?

Loren Dean: It does not, but there is a good reason for that; there’s no library support anywhere else. You need libraries, you need FFT, you need BLAS, etc. Nvidia has them and they’re part of the CUDA ecosystem. It’s not available in OpenCL. It’s just nonexistent. We’ve architected to be able to support OpenCL if and when it comes, but today it’s not there.

insideHPC: From a business standpoint, do you think this will help you sell more software licenses?

Loren Dean: Yes, we think it will. During my GTC talk, I asked the room how many of them were using Parallel Computing Toolbox. About half the hands went up. So yes, we will see growth. A lot of people here are doing parallel computing already, but this makes it a lot more accessible.


  1. If you’re curious, you might consider giving Jacket a try for running MATLAB on the GPU – here:

  2. Or also GPUMat (, which is free (but not open source), all these three groups (Parallel Toolbox, GPUMat and JackRabbit) use the same approach, they declare a new datatype (or more precisely a new class) and provide a long list of operations that execute on the GPU (+,-, *, fft, etc)

    It would be great to have some benchmarks, by far I can tell that GPUMat gave us just some moderate speed-ups (4 to 10 times), we still prefer to write our own mexFunctions to get the max of performance

  3. Jacket is the only product that has all of the following (PCT and GPUmat do not have these):

    Reduction Function Support (e.g. SUM, FIND, MIN, MAX, ANY, ALL, etc.)
    Convolution Function Support (e.g. CONV, CONV2, CONVN, FILTER2, etc.)
    Linear Algebra Support (e.g. INV, SVD, EIG, MLDIVIDE, QR, LU, etc.)

    Simply pointing out that you get what you pay for with GPUmat or you get what you will wait a long time for with PCT…