I’ve given a couple presentations lately in which I needed to communicate the challenges of multicore development and the notion of parallelism everywhere with the software, reliability, and OS challenges very large scale parallel computing. Since many of you are probably also giving prsentations like this, I thought I’d share some recent experience with a new phrase I’m trying.
After trying a few variants my audiences seem to be responding to the idea of computing at arbitrary scale. (As Shakespeare said the Book of Ecclesiastes says, there is very little new under the sun; if you know where this term originated, leave a comment please.)
Note that this is arbitrary with respect to what the computational support infrastructure and software development framework are assumed to know a priori about the runtime environment. Certain programming paradigms and most operating systems implementations, for example, implicitly assume that they will be used on relatively low processor counts. As another example, MPI programs frequently conflate algorithms and implementation, and the implicit assumptions about scale embedded in them (choice of algorithm, implementation of collective operations, etc.) limit the performance of applications well outside the design space.
So, we need operating systems and software development frameworks that support computing at arbitrary scale. For arbitrarily large process-count executions, the operating system and the application need to be resilient in the face of the hardware failures that are nearly guaranteed to happen, and the algorithms chosen for the work to be done need to appropriately address the resources (FLOPS, memory, bandwidth, etc) available. For arbitrarily small process-count executions, the same software needs to adapt to the resources available and regain the overhead lost (potentially) in maintaining resilience.