Sign up for our newsletter and get the latest HPC news and analysis.

LAPACK on CUDA beta available for free, takes refreshing approach to HPC software

I’ve been watching CULAtools for a while now, and I really think this is an interesting effort. The team is working on developing a CUDA-based implementation of the popular LAPACK mathematical library, and the beta is available now for free download from their website.

The technology proposition is fairly straightforward: take the LAPACK interface, implement the functions under the interface to take advantage of NVIDIA’s GPUs, and hide as much as possible of the nastiness of CUDA programming (allocations, workspace creations, and memory transfers are all taken care of under the hood) from the developer using the library while still getting the benefit of GPU speedup. From their website

The actual speed-up depends heavily on the algorithm, the size of your data set, and what you are benchmarking against. When benchmarking against a standalone LAPACK implementation, CULA routines typically show a 40x to 200x speed-up. In most cases, there is a 3x to 10x speed-up when comparing CULA to Intel’s Math Kernel Library implementation of LAPACK running on their latest Core i7 processor. We have assembled several performance charts comparing CULA to other implementations of LAPACK.

There are more performance results here.But the their approach to the business is at least as interesting as the technology.

EM Photonics is developing CULAtools in partnership with NVIDIA. There are (going to be) three versions of the product: Basic, Premium, and Commercial. The Basic product — the one available for free download today — will always be free, and will support single precision complex versions of some of the really common calls in LAPACK. Need double precision or routines not included in Basic? Then you need the Premium tool. Need to incorporate the libaries into a product for redistribution? Then Commercial is for you.

CULA Basic contains some of the most popular linear algebra routines: LU decomposition and system solve; QR factorization; singular value decomposition (SVD); and least squares analysis in both constrained and general variants. CULA Premium and CULA Commercial both expose a greater number of functions. When new functions are implemented in CULA, they will be made available to CULA Premium and CULA Commercial users.

This isn’t a novel business model, it’s used out in the Internet all the time for things like online storage for pictures of Junior (free up to 2 Gb, then tiered pricing for increasing amounts), and software (I have the free version of a family tree package that will support up to 50 relatives; more than that and I have to pony up). But it is not a common model in HPC, where the availability of tools is at least part of the suite of problems that have slowed the penetration of HPC down to the individual “tinkerer” level (ie, personal supercomputing). Everything about the way they have positioned this product, including the website itself, seems to be informed by modern sensibilities and market structure of the non-HPC world.

I’m excited about what they’ve done, and I hope that this approach to new HPC businesses is a sign of things to come.


  1. [...] Read the original here:  LAPACK on CUDA beta available for free, takes refreshing approach … [...]

Resource Links: