New version of CLE runs “any standard x86-based Linux application”
Today at the IDC HPC User Forum Barry Bolding, Cray’s VP of scalable systems, introduced what I reckon to be the most significant strategic announcement the company has made since it moved to consolidate around the high end business shortly after Pete Ungaro stepped up from sales into the role of CEO and President several years ago.
On the surface, it doesn’t sound like much: today marked the release date for version three of the operating system Cray uses on its high-end XT systems, the Cray Linux Environment (CLE).
But the central purpose of this release is to bring to XTs the ability to run any standard x86-based application, removing a major objection to purchase often cited by the well-heeled customers of both Cray’s high-end XT and midrange XT*m systems. In fact, the move has already influenced at least one major acquisition: when I spoke with Bolding last week ahead of this announcement he mentioned CLE 3.0 and the ability to run ISV applications as a major factor in Cray’s recent $45M sweep of the DoD HPC Modernization Program.
Cray started its move toward ISV compatibility and broader market acceptance with the introduction of the CX1 and the recently added CX1000 products. The release of CLE 3.0 moves to close that circle from the performance end of the business, creating the beginnings of a path that will allow customers to move smoothly from a workstation in their office to multi-hundred thousand core supercomputers, without ever leaving the Cray family.
Computing, à la mode
The secret behind CLE’s newfound love for ISV applications is the introduction of a new group of features in the operating system called Cluster Compatibility Mode, or CCM. CCM allows nodes in an XT line supercomputer to run a fully standard x86 Linux (SUSE SLES 11 in this case) — applications simply install and run. Matlab on your XT6? You can do that.
CCM is contrasted with Extreme Scalability Mode, which until today was the only way to run an XT. In ESM a single application can span hundreds of thousands of cores, taking advantage of the CLE’s lightweight kernel for scalability, and the custom SeaStar interconnect and tuned communications libraries for speed. In CCM, however, applications are limited to 2,048 cores, and only have access to MPI on a TCP/IP stack for communication. Bolding describes CLE 3.0 as a “feature release,” meaning they were focused on getting ISV software onto Cray supercomputers. The next release, planned for next year, will be a “performance release,” with both larger numbers of cores made available to CCM applications and support for OFED (and thus InfiniBand).
Another key feature of CLE 3 is that datacenter administrators do not have to partition their machines into blocks of nodes dedicated to the two modes ahead of time: nodes can swap from one mode to the other via a user-settable parameter in the job submission script. According to Bolding under CLE 3.0 nodes run in ESM by default. When a user submits a job using software that requires CCM, the job dispatch system (called ALPS, in case you are curious) instructs the nodes to set up the standard Linux environment on the nodes, the job runs, and then those nodes are returned to ESM before being returned to the compute pool. Bolding says that the time to set up CCM on a pool of nodes is “only a few seconds.”
There is another nice feature of CLE 3.0 that you have to be in HPC center management to care about, which I mention because I used to be in center management: you can finally run a networked license manager for your software without having to do a backbend and perform a ritual sacrifice. About time.
Something for everyone
Although the most significant strategic feature of CLE 3.0 is CCM and its ability to run standard Linux applications, Bolding says there is quite a bit baked into this release for the high end as well. Much of this is there to support Baker, Cray’s next-generation high-end HPC platform, and the company is delaying discussion of most of those features until that platform launches later this year.
We do know that this release scales support to machines with “more than 500k cores,” up from the 250,000 core machines supported in the last version of CLE. Lustre 1.8 is also supported, as are numerous reliability enhancements like warm swap of blades and link resiliency for Baker systems.
CLE 3.0 also includes a nifty performance feature called “core specialization.” Core specialization allows 1 of the 24 cores in a Magny Cours node to be designated as the “OS” core. When activated, all OS tasks are pinned to that one core. Bolding explains that this doesn’t benefit all applications, which is why Cray has taken the step of allowing users to specify whether this feature is turned on or not at runtime (as it did with the CCM). “We have seen application performance range from 20% faster to 5% slower,” says Bolding, “it just depends upon the particulars of each application.”
Some now, more later
The new Cray Linux Environment is being rolled out across Cray’s XT lines in phases. The XT6 and XT6m will be the first to get it this quarter. The XT5 and XT5m will see CLE 3.0 later this year, and it will be available on XT4s in early 2011.
This is clearly an important step for the company, but what about going further and having a single operating environment on all of its products, from the Intel-based CX line up through the XTs? Bolding says this is something they are discussing, but it isn’t on the roadmap today.
We do already know that Cray’s HPCS Cascade system is Intel-based, and that system will run a SUSE-based operating system (codenamed “Nile” if you are keeping track). It seems unlikely that Cray will want to maintain a separate operating system for Cascade and the AMD-based XT line, so at some point (CLE 5?) CLE will probably run on both AMD and Intel processors.
A good move at a good time
Today’s Cray is in the strongest financial position it has seen in quite some time, and it has an excellent stable of (at least until Baker later this year) well-tested hardware that has been enthusiastically received by a core of high end customers. The company has continued to flirt with profitability, but has not yet convincingly crossed that threshold. What it needs to push it over the line is a way to grow the market for its flagship, high(er) margin supercomputers. CLE 3.0 is a significant step in that direction.
While it seems likely that Cray won’t enjoy the full benefit of the ISV compatibility story with new customers until CLE 4.0 brings higher performance to CCM, today’s announcement has already won it significant new business, with the potential for even more growth over the next year.







It was back in November when Microsoft released Windows HPC Server 2008 R2 Beta 1 to the world. In the last 3 months Microsoft has spoken to their users and closely monitored how it’s being used, and now has
This series is about the men and women who are changing the way the HPC community develops, deploys, and operates the supercomputers we build on behalf of scientists and engineers around the world and Ricky Kendall, this month’s HPC Rock Star, is at the center of enabling science on the largest computing systems the world has ever seen.
Today at Oak Ridge, Kendall serves at the group leader for the Scientific Computing Group, a role that he describes as “definitely on the enablement side” of the computational spectrum. “My team’s focus is to help our users get the most out of the resources we have and plan to have at the facility. I have an amazingly talented team that does this job and we have been reasonably successful in integrating with our user community and getting codes to scale to the size of our Jaguar system.”
Dell clearly knows how to build and market computers; it is one of the largest PC manufacturers in the world, and enjoys a strong presence in the enterprise. Over the past several years the company has had some significant wins with large systems in HPC, and its CEO Michael Dell even keynoted SC08 in Austin, TX. But the company’s wins to date seem to have been mostly based around either highly specialized marquee custom builds, or on jamming its enterprise class systems in an HPC envelope and shipping them out to low end customers. There is evidence that Dell is maturing its approach to HPC however. Over the past two years the company has been on a hiring spree, hiring some of the top systems engineers and architects in the business from other companies in the HPC space. Even better, with the launch of the C6100, it looks like Dell is actually listening to the people it hired as it builds a product portfolio specifically for us.
Today’s launch of Intel’s new 7500 series (“Beckton”) processor with its eight cores and 16 standard DDR3 DIMMS per CPU socket has spurred the
Although it’s somewhat uncharacteristic of SGI to release pricing this early, they’ve already briefed insideHPC on a starting price: an Altix UV 10 with four Intel Xeon X7542 processors (6‐core, 2.66GHz), 32GB of memory, and a SATA boot drive will set you back $33,250. And unlike its UV 100/1000 brethren, this little guy is available today.


oday the Parallel Programming Community on the Intel Software Network is publishing a
Programming Massively Parallel Processors: A Hands-on Approach
This podcast is part of the exclusive video, audio, and feature series at insideHPC.com called
During the week of SC09, insideHPC grabbed a minute with Barry Hess, the general chair of
Today Bill Kramer is the deputy project director and co-principal investigator for the Blue Waters project at the National Center for Supercomputing Applications (NCSA), at the University of Illinois in Urbana-Champaign. This is ground zero for the first sustained PFLOPS (10+ PFLOPS peak) supercomputing center dedicated to diverse science and engineering; but it’s not really about the computer. Over the past several years Bill and his team have been focused on building the facility and designing a system that, when finally turned on next year, will probably be the largest system for open science in the world. But if you’ve been following what the Blue Waters team has been doing you’ll see that they have taken a radically different approach to the launch of this capability into the community.
This is the perfect place for Bill Kramer. In talking with Kramer about his accomplishments, it is clear that he is one of those people who have driven their career paths with a guided purpose. As he describes it, the common thread across all of the places he’s been in his career is that they were all setting the pace for HPC at the time.
Throughout all of these very challenging assignments, Kramer has remained dedicated to volunteer service. “These are very symbiotic commitments,” he says. “Certainly the organizations benefit, and I enjoy giving back to the community. But volunteer assignments are a great way to refresh my point of view and to develop new skills that, sometimes, end up helping out professionally.” Kramer says that a lot of what he has learned about managing people has come from experience in volunteer organizations. Over the years he has served in SCUBA organizations and volunteered in schools and community theaters. He also helped start the tutorials effort and graphics special interest group of the Digital User’s Group, and has been active in SIGGRAPH. But people are probably most familiar with his service to the SC conference series, which included a year as General Chair of the Conference in 2005 when he hosted Microsoft Chairman Bill Gates on the stage in Seattle.
Late last week I talked with AMD about Llano, their forthcoming laptop platform. That’s right, a laptop chip. So why “waste” your time when this is an HPC blog? Well, the concept is interesting, and I bet the technology that they are jamming into that chip finds its way upstream before too long.
There is a lot of activity — and a lot of hype! — today around the ways in which vendors and large supercomputing centers are trying to reduce their power usage while still getting useful work done. But there is only so much you can do adapting today’s technology, and to truly transform our approach to energy use in HPC we will need new technologies for operations and instrumentation, control system software, operating systems, job schedulers, computational algorithms, chip design, networking, and in many other areas.


