At SC09 this week The MathWorks made a couple of announcements related to MATLAB and Simulink, the company’s flagship computation products. Many of you will be familiar with MATLAB as one of the very popular high-level languages and interactive environment that enables you to perform computationally intensive tasks without needing to manage all the details that lower-level languages like FORTRAN and C require. With 2,000 employees in offices all over the world, The MathWorks reports over 1,000,000 users in more than 175 countries in industries ranging from aerospace and defense to education and electronics.
Over the years the company has expanded its offerings to support serious computation, including built-in support for multicore parallelism and mechanisms to allow for distributed computation via libraries like MPI on a large scale. The announcements this week build on those developments.
Enhancements in the Parallel Computing Toolbox
The MathWorks has announced a new version of the Parallel Computing Toolbox — the collection of routines that allows MATLAB programs to run on distributed environments, ranging from multiprocessor computers to clusters and grids with just a few changes to the serial programs. This new version provides an improved distributed array construct to enable MATLAB users to directly access large datasets distributed over many cores, sockets, or nodes in a large compute array.
When I talked with Silvina Grad-Freilich, manager of parallel computing and application deployment marketing at The MathWorks, about this new release of the PCT, she used an example application from the Max Planck Institute’s cancer research program to illustrate the advantages of the changes in the software. Researchers there are working on discovering new cancer therapies, and are generating high quality 3D images of proteins that require millions of projections. With the new PCT they are seeing improvements of 30x in a pool of 64 MATLAB workers — not exactly ideal scaling, but the point is that the performance improvement is significant enough to dramatically improve their throughput with no time or resources lost to developing specific expertise in parallel programming. Not the right solution for everyone, but for many in MATLAB’s target audience, I bet this is a great fit.
This release also features better parallel performance for algorithms in the Statistics and Communications Toolboxes that rely on the Parallel Computing Toolbox. This adds on to already existing functionality in Bioinformatics, Optimization and Genetic Algorithms Toolboxes. Toolbox don’t need to make any changes to their codes to take advantage of multiple processors. You simply point MATLAB or Simulink at a processor pool (a defined set of resources defined in the application that could include multiple sockets on your machine, machines on your local network, or a remote cluster) that includes multiple processors, and the Toolboxes automatically distribute your computation across the whole set of resources.
MATLAB plus the TeraGrid
Cornell University also announced this week that the Cornell Center for Advanced Computing (CAC), in partnership with Purdue University, has been funded by the NSF to bring MATLAB to the TeraGrid as an experimental computing resource. A statement from Robert Burhman, Cornell University vice provost of research, makes it clear that this is again about expanding the applicability of HPC resources to those without deep skills in this arena: “MATLAB on the TeraGrid will help enable a broader class of researchers who are well-versed in MATLAB to reduce the time to solution in a scalable manner without having to become parallel programming experts.” TeraGrid is following on the heels of the Enabling Grids for E-sciencE (EGEE) team in Europe, where MATLAB has been supported since October of 2008.
The Cornell announcement will make MATLAB available to remote desktop and Science Gateway users, and includes support by industry partners Dell and Microsoft along with the MathWorks. The software will be hosted on a 512-core Dell PowerEdge HPC cluster at the Ithaca, NY campus of Cornell running Windows HPC Server 2008. The two use models initially envisioned are the standard single-user interactive MATLAB use you would have using the program on your own system, and as an engine driving Science Gateways such as nanoHUB.org.
While there is a production aspect to this deployment, the NSF funding should provide some insight that there is also a strong research aspect in beginning to understand the challenges and opportunities in deploying software as a service across a large user base that is not necessarily familiar with HPC technologies.