Entries filed under “Featured Stories”

The historical archive of exclusive in-depth articles written by insideHPC’s editorial staff that you’ll find only at insideHPC.com.

Cray poised to grow market through the magic of ISV software

New version of CLE runs “any standard x86-based Linux application”

Cray logoToday at the IDC HPC User Forum Barry Bolding, Cray’s VP of scalable systems, introduced what I reckon to be the most significant strategic announcement the company has made since it moved to consolidate around the high end business shortly after Pete Ungaro stepped up from sales into the role of CEO and President several years ago.

On the surface, it doesn’t sound like much: today marked the release date for version three of the operating system Cray uses on its high-end XT systems, the Cray Linux Environment (CLE).

But the central purpose of this release is to bring to XTs the ability to run any standard x86-based application, removing a major objection to purchase often cited by the well-heeled customers of both Cray’s high-end XT and midrange XT*m systems. In fact, the move has already influenced at least one major acquisition: when I spoke with Bolding last week ahead of this announcement he mentioned CLE 3.0 and the ability to run ISV applications as a major factor in Cray’s recent $45M sweep of the DoD HPC Modernization Program.

Cray started its move toward ISV compatibility and broader market acceptance with the introduction of the CX1 and the recently added CX1000 products. The release of CLE 3.0 moves to close that circle from the performance end of the business, creating the beginnings of a path that will allow customers to move smoothly from a workstation in their office to multi-hundred thousand core supercomputers, without ever leaving the Cray family.

Computing, à la mode

The secret behind CLE’s newfound love for ISV applications is the introduction of a new group of features in the operating system called Cluster Compatibility Mode, or CCM. CCM allows nodes in an XT line supercomputer to run a fully standard x86 Linux (SUSE SLES 11 in this case) — applications simply install and run. Matlab on your XT6? You can do that.

CCM is contrasted with Extreme Scalability Mode, which until today was the only way to run an XT. In ESM a single application can span hundreds of thousands of cores, taking advantage of the CLE’s lightweight kernel for scalability, and the custom SeaStar interconnect and tuned communications libraries for speed. In CCM, however, applications are limited to 2,048 cores, and only have access to MPI on a TCP/IP stack for communication. Bolding describes CLE 3.0 as a “feature release,” meaning they were focused on getting ISV software onto Cray supercomputers. The next release, planned for next year, will be a “performance release,” with both larger numbers of cores made available to CCM applications and support for OFED (and thus InfiniBand).

Another key feature of CLE 3 is that datacenter administrators do not have to partition their machines into blocks of nodes dedicated to the two modes ahead of time: nodes can swap from one mode to the other via a user-settable parameter in the job submission script. According to Bolding under CLE 3.0 nodes run in ESM by default. When a user submits a job using software that requires CCM, the job dispatch system (called ALPS, in case you are curious) instructs the nodes to set up the standard Linux environment on the nodes, the job runs, and then those nodes are returned to ESM before being returned to the compute pool. Bolding says that the time to set up CCM on a pool of nodes is “only a few seconds.”

There is another nice feature of CLE 3.0 that you have to be in HPC center management to care about, which I mention because I used to be in center management: you can finally run a networked license manager for your software without having to do a backbend and perform a ritual sacrifice. About time.

Something for everyone

Although the most significant strategic feature of CLE 3.0 is CCM and its ability to run standard Linux applications, Bolding says there is quite a bit baked into this release for the high end as well. Much of this is there to support Baker, Cray’s next-generation high-end HPC platform, and the company is delaying discussion of most of those features until that platform launches later this year.

We do know that this release scales support to machines with “more than 500k cores,” up from the 250,000 core machines supported in the last version of CLE. Lustre 1.8 is also supported, as are numerous reliability enhancements like warm swap of blades and link resiliency for Baker systems.

CLE 3.0 also includes a nifty performance feature called “core specialization.” Core specialization allows 1 of the 24 cores in a Magny Cours node to be designated as the “OS” core. When activated, all OS tasks are pinned to that one core. Bolding explains that this doesn’t benefit all applications, which is why Cray has taken the step of allowing users to specify whether this feature is turned on or not at runtime (as it did with the CCM). “We have seen application performance range from 20% faster to 5% slower,” says Bolding, “it just depends upon the particulars of each application.”

Some now, more later

The new Cray Linux Environment is being rolled out across Cray’s XT lines in phases. The XT6 and XT6m will be the first to get it this quarter. The XT5 and XT5m will see CLE 3.0 later this year, and it will be available on XT4s in early 2011.

This is clearly an important step for the company, but what about going further and having a single operating environment on all of its products, from the Intel-based CX line up through the XTs? Bolding says this is something they are discussing, but it isn’t on the roadmap today.

We do already know that Cray’s HPCS Cascade system is Intel-based, and that system will run a SUSE-based operating system (codenamed “Nile” if you are keeping track). It seems unlikely that Cray will want to maintain a separate operating system for Cascade and the AMD-based XT line, so at some point (CLE 5?) CLE will probably run on both AMD and Intel processors.

A good move at a good time

Today’s Cray is in the strongest financial position it has seen in quite some time, and it has an excellent stable of (at least until Baker later this year) well-tested hardware that has been enthusiastically received by a core of high end customers. The company has continued to flirt with profitability, but has not yet convincingly crossed that threshold. What it needs to push it over the line is a way to grow the market for its flagship, high(er) margin supercomputers. CLE 3.0 is a significant step in that direction.

While it seems likely that Cray won’t enjoy the full benefit of the ISV compatibility story with new customers until CLE 4.0 brings higher performance to CCM, today’s announcement has already won it significant new business, with the potential for even more growth over the next year.

Also posted in Business of HPC, HPC, HPC Software, Tools | Leave a comment

Inside Windows HPC Server 2008 R2 Beta 2

This is a contributed piece by Randall Hand, a regular reader and the publisher of VizWorld.com.

Microsoft logoIt was back in November when Microsoft released Windows HPC Server 2008 R2 Beta 1 to the world. In the last 3 months Microsoft has spoken to their users and closely monitored how it’s being used, and now has Windows HPC Server 2008 R2 Beta 2 ready. After a short phone call with Microsoft’s Ryan Waite, the Product Unit Manager of the Microsoft High Performance Computing Group, insideHPC got the inside look at some of the ways that Microsoft is improving the centerpiece of its HPC strategy.

Scavenging built in, Windows+Linux goodness

There are a few main features they’ve added that will appeal to HPC Server administrators. The first is the ability to extend the size of your cluster to include any Windows 7 workstations on your network, similar to the Folding@Home or Seti@Home model. This feature works with today’s available Windows 7 install, no patches or upgrades required, and allows you to use idle cycles of your other machines on your network to improve your HPC cluster’s power immediately. Unlike the @home solutions, however, it doesn’t run continuously or in the background but is instead scheduled by the administrator. This means you can make the desktops fully interactive and available to users during the usual workday, and then add them to your HPC cluster overnight.

One other important addition for administrators is improved integration with queueing systems from Adaptive Computing & Platform Computing. Users of those two queueing systems have a great new feature: Hybrid Windows/Linux Clusters. With this feature administrators can dual-boot your nodes as Linux and Windows HPC Server nodes, and the queueing system will dynamically switch between the OS’s (it requires a reboot of course, but the queueing software manages that for you) as the load changes. This means you can have a single cluster to service Windows and Linux applications.

Of course, managing one operating system across a cluster is bad enough, but two is enough to make most admins cry. Through a partnership with Clustercorp, the Rocks+ systems administration package can now not only manage updates and patches on Linux, but on Windows as well. So now you can have a single cluster to run both Windows and Linux applications and a single systems management tool to manage both sides of the house.

Network boot

One other great feature that is new in Windows HPC Server 2008 R2 Beta 2 is the new support for Network boot. Previously, administrators had to manually install the HPCServer OS on each node of a cluster, making setup and administration a headache. This new beta release adds the capability of running nodes completely diskless, booting from a single central image. This makes administration easier, and also reduces points of failure (no more dead OS drives) and power-consumption (fewer spinning drives), while improving density.

Making strides

So all in all, it looks like Microsoft is actually making great strides in their HPC Server product. The existing integrations with user tools like Excel and scientific tools like Matlab and Fluent make HPC Server a viable option in the HPC space, and the new features in Beta 2 show that they are quickly learning what they need to implement to make it an attractive alternative to systems administrators and datacenter designers. Add in the new Visual Studio 2010 with support for MPI Debugging and HPC Server design, and it’s also attractive to the software designers.

While I doubt Microsoft will unseat the likes of Linux as the dominant HPC platform, and I don’t think you’ll sites like Oak Ridge switching anytime soon, it’s a great alternative for smaller or corporate shops that already use existing Microsoft platforms and Microsoft-trained administrators. Microsoft’s HPC Server 2008 is becoming a more and more attractive solution.

Randall Hand is the publisher and editor of VizWorld.com, the web’s best site dedicated to computer graphics and scientific visualization.

Also posted in HPC Software, Tools | Leave a comment

Rock Stars of HPC: Ricky Kendall

Ricky KendallThis series is about the men and women who are changing the way the HPC community develops, deploys, and operates the supercomputers we build on behalf of scientists and engineers around the world and Ricky Kendall, this month’s HPC Rock Star, is at the center of enabling science on the largest computing systems the world has ever seen.

Kendall is the leader of the scientific computing group at one of the nation’s leading HPC facilities, the National Center for Computational Science at Oak Ridge National Laboratory, where he and his team help users get the most out of what is today the largest supercomputer in the world. But this isn’t a theoretical task for Kendall — he comes from the large scale application development trenches himself, having been part of the team that started NWChem, one of the leading community codes for computational chemistry. Kendall’s accomplishments put him in the center of the computational community, in a role we used to call a computational engineer when I was in graduate school. As he puts it, “The chemistry community often sees me as a computer jock, and the computer science community sees me as an applications person.”

Kendall is the kind of leader that the HPC community needs most: someone committed to making sure that the systems our community builds end up helping to move the world forward.


Ricky Kendall started his career as a staff scientist at Pacific Northwest National Laboratory where he was responsible for the development of computational chemistry in support of the waste remediation activities of the Environmental Molecular Sciences Laboratory (EMSL). Part of this work included development that eventually became the community chemistry code NWChem, an application that is in wide use today for a variety of problems of interest to the science and engineering communities.

But Kendall wasn’t solely focused on computational code development. During his time at PNNL he continued to develop his desire to help prepare the next generation of computational professionals by serving as an adjunct lecturer at Washington State University and working with high school students. Kendall says that the challenges were fun and rewarding, both for him and the students. “You learn a great deal when you have to explain things so that students can understand the topic,” he says. “You also learn what you thought was true may not be quite right.”

After leaving PNNL, Kendall headed to Ames Laboratory in Iowa where he served as a computational scientist. He took the teaching bug with him when he moved, and added an adjunct associate professorship at Iowa State University to his regular duties at the lab. In addition to developing his own understanding of the field, Kendall says that he also had the sense that he was filling a real need in our community. “At WSU and Iowa State University, the courses I mostly taught involved programming. I found that programming skills are not stressed by the CS curriculum at many schools, and felt I wanted to help students get those practical skills.” He also contributed to the strength of the HPC community directly by developing an HPC course at ISU. The course was geared toward learning different parallel programming models, which he says the students found challenging and useful, and which ultimately included students from aerospace engineering, chemistry, physics, and other departments across the campus.

As he was pursuing his “regular” job and keeping up with his teaching duties Kendall also found time to publish, and the list of his publications is impressive not only for sheer quantity, but for the diversity of topics which range from low level performance measurement to application and algorithm development. Kendall credits this unusual diversity with values instilled by his graduate advisor, “My advisor felt that students should have skills in both applications and theory and code development,” he explains, “and I found that I really liked doing the code development in addition to the application work. I find it rewarding being able to use a code I helped develop on the applications I’m interested in, knowing that the development was driven by the needs of the application space.”

It takes a village

Ricky with JaguarToday at Oak Ridge, Kendall serves at the group leader for the Scientific Computing Group, a role that he describes as “definitely on the enablement side” of the computational spectrum. “My team’s focus is to help our users get the most out of the resources we have and plan to have at the facility. I have an amazingly talented team that does this job and we have been reasonably successful in integrating with our user community and getting codes to scale to the size of our Jaguar system.”

Kendall’s experiences with both education and mentoring and large scale application development make him uniquely suited to helping ORNL’s computational communities make effective use of systems like Jaguar, currently ranked #1 on the Top500 list of the world’s largest supercomputers. “For most of my career,” he says, “I have sat on the fence that separates applications guys and developers. The chemistry community often sees me as a computer jock, and the computer science community see’s me as an applications person.”

But this perspective is extremely useful, Kendall explains, because leadership-scale science is a multi-disciplinary effort. “Many of the most successful applications on leadership computing facilities today have multidisciplinary teams. These teams have someone that understands the theory being used, the mathematics, the algorithms, computational science at scale, programming skills and core computer science skills. All are needed to make the application work on the leadership systems and be potential candidates for future systems. The successful applications plan for change and have ways to deal with how hardware evolves.”

Two handshakes of separation

Multi-disciplinary teams of this kind are really communities, and even a quick glance at Kendall’s resume reveals a commitment to the HPC community that goes beyond teaching and education. “The best advice I got when starting down the development path,” he continues, “was to steal what you can and only write the parts you have to. I think that still holds. The trick is to make yourself aware of what others are doing and how you might leverage it.”

For Kendall a key part of being aware of what others are doing is involvement in community events like the SC conference series, for which he is serving as the Technical Program Chair as part of SC10. This is a huge job, and represents a significant commitment of time and energy above and beyond one’s day job and the rest of your life. I asked what drives him to put so much energy into what is, essentially, an optional activity. “There are many reasons to be involved in community efforts,” Kendall explains. “One is to help spread the word about the things you are doing as a scientist and as part of an organization. Another is the networking aspects of such involvement: you are no more than two handshakes away from anyone in the HPC community, and it’s important to make those connections for yourself, your students and your organization. In terms of building an organization and keeping it healthy, recruiting staff is an incredibly time consuming and interactive task. By being involved in such efforts as the SC conference you get a good feel of the overall community and help your recruiting efforts. You also learn what others are doing and can potentially leverage other activities in the community with your own scientific missions and goals. These kinds of grass roots connections can lead to collaborative efforts and new areas of research.”

Whack-a-mole?

As an educator, community leader, and technologist Kendall has already helped move the HPC community through many transition points. What does he see as our next significant challenges? “Software is one of the biggest challenges we face,” he explains. “Exascale software is likely not going to look the way applications look today. We are at a turning point, and where we go next is an open question. In general though to get to the exaflops scale we are going to have to focus more on programming in the node. The path forward here is getting more powerful nodes and lots of them. This means that as a community we will have to deal with multiple levels of concurrency and make that all work. This means that we will have to realistically bring together some of the old vector techniques, invent new many core techniques, and utilize the scale of the nodes all at the same time. There is no free lunch here, and there needs to be a lot of diversity available to the community to try different techniques and algorithms.”

Getting this kind of diversity into the efforts we pursue on the way to exascale is going to mean adding room in the process for failure, with many incremental steps and missteps on the way to the final destination. “I often describe scaling codes to large core counts as playing ‘whack-a-mole,’ because you find and eliminate a bottleneck to scaling and something else pops its head up. The path to exascale is going to be a multidimensional whack-a-mole with really ugly moles! Its going to be a lot of work but there will likely be some fun rolled in along the way as well.”

As long as I’m useful

Kendall describes his role today as a “glue person” helping to join applications and computer scientists on teams that do some of the most advanced computational simulations in the world. This is a role that Kendall relishes, incorporating staff development and mentoring along with a deep understanding of technology and applications domains. “I decided to take the job at ORNL to help build the leadership computing facility and my team along with the rest of the division and our sponsors have been able to deliver on that front. We have the #1 system on the Top500 list, and we were able to work with our users who got 3 applications doing science at above 1 Petaflops of performance. I enjoy the enabler role and will continue in that vein as long as I’m useful.”

Also posted in HPC People, Rock Stars of HPC | Leave a comment

Dell launches HPC building block, but don’t worry: it’s not new

Last week Dell launched a new series of PowerEdge C servers under the title “Dell Brings Specialized Cloud Computing Infrastructure To The Mainstream,” an announcement that is not the usual sort of thing I cover here. But there actually is an HPC angle to this story, and one that may signal a real shift in our market for a company which has had a hit and miss track record at best in HPC.

Dell logoDell clearly knows how to build and market computers; it is one of the largest PC manufacturers in the world, and enjoys a strong presence in the enterprise. Over the past several years the company has had some significant wins with large systems in HPC, and its CEO Michael Dell even keynoted SC08 in Austin, TX. But the company’s wins to date seem to have been mostly based around either highly specialized marquee custom builds, or on jamming its enterprise class systems in an HPC envelope and shipping them out to low end customers. There is evidence that Dell is maturing its approach to HPC however. Over the past two years the company has been on a hiring spree, hiring some of the top systems engineers and architects in the business from other companies in the HPC space. Even better, with the launch of the C6100, it looks like Dell is actually listening to the people it hired as it builds a product portfolio specifically for us.

What’s in the PowerEdge C6100

The PowerEdge C6100 is one of three systems launched last week for high density and low energy consumption deployments. In today’s IT-speak, these features get you permission to use the word “cloud” in your headlines. You can read the full release at Dell’s web site, and from it you’ll gather that the PowerEdge C1100 and C2100 really are mostly aimed at a scale-out market. But the C6100 is a 4-node cluster optimized server that Dell has built for HPC with direct input from their new HPC-savvy employees.

The C6100 is a dense solution, with 4 motherboards packed vertically in a 2U, horizontally-stacked chassis. Each board can handle either two Westmere (Xeon 5600) sockets or two Nehalem-EPs along with 12 DIMM slots each of which can run up to 1333MHz. Each board also has two built-in GigE ports, a PCIex16 slot for IB (for a GPU if you’d like, the unit has been Nvidia S1070 HIC certified) and a PCIe x8 slot suitable for a 10 GbE NIC or a Mellanox ConnectX 2 QDR IB dual-port daughter-card. Each board in the chassis is individually serviceable, so you can power one down and replace it without having to power down the other boards. You can configure a chassis with either 12 x 3.5” drives (3 per system), or 24 x 2.5” drives, and the drive bays are located across the front for easy access.

The chassis power supplies are hot-swappable at 1100W today, with some hints that a larger version may be coming soon. This is important because if you fully-load your chassis with drives, memory, and fast processors and you lose an 1100W power supply today you’d have to draw down power on one of the boards. Moving to a larger power supply will introduce full redundancy; with a less-than-maxed-out systems (say, mid-bin processors or less memory) you would be fully redundant on power today. This launch decision reflects an important HPC-oriented design point: the C6100 is not designed to hold anything that anyone might put in it. It’s designed to be energy conscious at the HPC price/performance sweet spot.

The next 500

With the C6100 Dell appears to be focusing not at the top of the list, although I’m sure they’d be happy to build you a big one if you’d like. One of the engineers I talked with at length about the system is motivated by a strong desire to reach the “next 500” in the HPC space, and in so doing open up the HPC mid market. The C6100 looks to be a good technical platform to help achieve that goal. But the technical specs are only part of the equation when you are aiming at the mid market. There you are selling into a market that doesn’t know HPC, and may not even have large enterprise servers. Dell can help allay customer installation and design concerns by bringing to bear all of its enterprise experience and worldwide team. These customers also have operational concerns, and Dell’s advantage with the C6100 is that it is not shipping a new product.

It’s new, but don’t worry

What I mean by that is that the technology that Dell is selling in this new PowerEdge C6100 server has actually been sold for the past several years through Dell’s Data Center Solutions group (as a side note the Dell team claims that if DCS were broken out into its own business it would still be one of the largest computer manufacturers in the country). This is the group that will build you anything you want, as long as you buy several thousand of them at one time. Working with these customers, Dell developed the progenitor of the C6100 and has sold “over 60,000” according to Dell’s Tim Carroll. Dell argues that this field experience has given them ample time to work out the kinks and refine the design for large scale customers, leading to a “launch” of a product they already know a great deal about from real field experience.

For those of you eyeing Dell’s mainstay HPC offerings, don’t worry. The blades and R410 rackmount server are still around, and still very much part of Dell’s HPC plans. But the C6100 definitely gives Dell’s current customers some interesting new options, and gives system architects another reason to look at Dell.

Also posted in Business of HPC, Compute, HPC Hardware | 2 Comments

Xeon launch spawns baby UV at SGI, plus SPEC numbers on the big boys

SGI logoToday’s launch of Intel’s new 7500 series (“Beckton”) processor with its eight cores and 16 standard DDR3 DIMMS per CPU socket has spurred the launch of a new product in the yet-to-ship Altix UV line at SGI.

The UV 10

SGI is adding a 4U rackmount 7500 series based product labeled at the UV 10 to its already announced UV 100 and 1000 series systems. The UV 10 puts 4 sockets of 7500 in a 4U rackmount chassis, for a total of 32 sockets. If you want to grow outside this footprint, however, you’ll have to go Ethernet or IB. Although it shares the name, the UV 10 does not share the NumaLink of its bigger brothers, so there’s no shared memory outside of this chassis. Processors are connected to each other via Intel’s QuickPath, and you can slot up to 512 GB of shared memory into a system.

UV10Although it’s somewhat uncharacteristic of SGI to release pricing this early, they’ve already briefed insideHPC on a starting price: an Altix UV 10 with four Intel Xeon X7542 processors (6‐core, 2.66GHz), 32GB of memory, and a SATA boot drive will set you back $33,250. And unlike its UV 100/1000 brethren, this little guy is available today.

So what you do with one of these? I don’t think they’d make a very good development system for one of the larger boxes, because without NumaLink the communications costs will be wrong and it would be pointless to spend a lot of time tuning your application. It is a fat memory box, though, so if you wanted to run a reasonably-sized database in memory or just a lot of relatively low core count jobs in a scale-out fashion, this box could make sense for you. Also it does share the same software stack as the rest of SGI’s UV line, so that’s a plus if the development you need to do depends more upon system software than performance tuning.

Performance numbers on the big UVs

Today’s 7500 launch also means that SGI can finally talk about the performance of UV, which adds to the momentum as we grow closer to the Q2 launch of SGI’s make-or-break HPC platform. SGI has published both SPECint and SPECfp numbers for an Altix UV 1000 system with 64 Intel Xeon X7560 processors and 2TB of DDR3 memory. Performance on SPECintrate2006: is #1 on any architecture, and on SPECfprate2006: is #1 on x86 architecture, and #2 behind an SGI Altix 4700 with eight times as many processors.

Here are the numbers SGI briefed me on ahead of the launch

SPECint_rate_base2006:

  1. SGI Altix UV 1000 512c Xeon X7560 10400
  2. SGI Altix 4700 Bandwidth System 1024c Itanium 9030
  3. Sun Blade 6048 Chassis 768c Opteron 8384 (cluster) 8840
  4. ScaleMP vSMP Foundation 128c Xeon X5570 3150
  5. SGI Altix 4700 Density System 256c Itanium 2890

SPECfp_rate_base2006:

  1. SGI Altix 4700 Bandwidth System 1024c Itanium 10600
  2. SGI Altix UV 1000 512c Xeon X7560 6840
  3. Sun Blade 6048 Chassis 768c Opteron 8384 (cluster) 6500
  4. SGI Altix 4700 Bandwidth System 256c Itanium 3420
  5. ScaleMP vSMP Foundation 128c Xeon X5570 2550


Also posted in Compute, HPC Hardware | 3 Comments

AMD launches Magny Cours, hopes 4 socket pricing attracts cluster builders

Magny Cours dieAfter a long and reasonably well-orchestrated marketing run up, plus a few partner slips, AMD has finally brought its 8- and 12-core “Magny-Cours” processor to market. The pitch: they have more cores, and by adopting a single platform underneath both the four socket and two socket variants of the platform, they’ve eliminated the 4-socket penalty. I talked to AMD’s John Fruehe last week to get the skinny on AMD’s new fat sockets.

Something for HPC system builders

AMD pricing slideLet’s hit these in reverse order as it could have some specific relevance to HPC cluster builders. Today’s launch of the 6100 series is the first of a pair of launches that will include the 4000s series. The 6000 is so-numbered, according to AMD, because it logically between the 8000 and 2000 series chips that they were peddling in the last generation of silicon. Pricing for those devices was similar to Intel pricing at the time (and likely Xeon pricing going forward). The chart at right is one that AMD briefed me on last week, showing where AMD is headed with its new pricing strategy. The new platform is priced under the low end of the previous generation of products, and is about half of the previous generation at the very high end. So not only has AMD priced aggressively to try to regain market share from Intel, they’ve eliminated the money decision that customers used to have to make in pondering whether to build their clusters from 4- or 2-socket motherboards.

This will noticeably affect system prices in clusters that were using the 4P configuration previously (not many HPC customers used this configuration because of the price/performance problems). It also has the potential benefit of allowing customers (in the right situations) to build denser clusters with less switching equipment and floor space. The 6100 also sports 4 memory channels per CPU with 12 DDR3 memory slots, compared to the Xeon’s 3 and 9 respectively (see the architecture figure below), meaning that for a fixed memory capacity system builders can potentially use cheaper, lower capacity DIMMs in an AMD system.

What’s in the 6100 series

AMD DCToday’s launch of the 6100 series will be matched later this quarter with the launch of “Libson,” the 1P-2P Opteron 4000 series counterpart to today’s big brother. The 6100 comes in either eight or twelve cores and in either 2P or 4P configurations, though its the same chip and the same cores in each case. The 6100 itself is really two of the 4100 series chips in a single package, so AMD is really getting to amortize its engineering costs here. The 6100 uses the new Socket G34 (the 4100 will use Socket F). As mentioned earlier the 6100 supports 4 memory channels and 12 DDR3 DIMM slots per socket, making a total of 8 or 16 channels available (2P or 4P). In the 2P configuration this gives builders the option of packing in 128 GB of RAM, and AMD’s Fruehe says that all 12 DIMMs on a socket can use 1333MHz memory, something AMD claims that Intel cannot do. The platform uses HyperTransport 3, and in the four socket configuration supports a full crossbar between all of the sockets (see the figure) — the previous generation socket interconnect, called Direct Connect 1.0, required two hops between the most distant processor pairs.

The 6100s also feature some new engineering aimed at driving down power, including a new power state, the ability to reduce p-states when a temperature threshold is reached (“Cool Speed”), improved monitoring, and support for the lower voltage DDR3 DIMMs (when they become available). The 12-core variants come in speeds between 1.7 and 2.3GHz with ACPs of 65 (for the HE bin) and 105W (for the SE), with most of the line coming in at 80W in the “mid bin” parts. The 8-core chips range between 1.8 and 2.4GHz and consume 65 to 80W. There are 64KB of data and instruction cache per core, 512KB of L2 per core, and 12MB of L3 per socket. The four x16 HyperTransport 3 links provide up to a maximum of 6.4 GT/s per link.

Partners who need partners are the luckiest partners

AMD is buddying up with its usual launch partners today, and its blogs and marketing materials feature the likes of Cray, HP, Dell, and Microsoft. New to this lineup (for me anyway) is Acer. During our conversation Fruehe mentioned that Acer is building an HPC business of all things in Europe and Asia, and is beefing up for a push into the North American HPC space next year.

Also posted in Compute, HPC, HPC Hardware | 3 Comments

Intel series on developing multithreaded applications

Intel Software logooday the Parallel Programming Community on the Intel Software Network is publishing a collection of technical papers to provide developers with additional support as they are trying to learn, or improve the use of, Intel’s large tool suite for parallel programming. I like the fact that they are publishing as a series of short papers, rather than a monolithic book format, because it provides developers more direct entry into just the content they need.

There are 25 papers in the series so far covering a wide range of topics from broad subjects (i.e., Granularity and Parallel Performance or Automatic Parallelization with Intel Compilers) to very narrowly focused guidance (i.e., Avoiding and Identifying False Sharing Among Threads). Taken as a whole, the series is designed so that an application developer can use the lessons and insight to improve multithreading performance on current and future Intel architectures (the whole idea of scaling forward).

Intel’s Aaron Tersteeg generously offered to give me a sneak peak at three of the papers in the series: Getting Code Ready for Parallel Execution with Intel Parallel Composer, Curing Thread Imbalance Using Intel Parallel Amplifier, and  Using Intel Parallel Inspector to Find Race Conditions in OpenMP-based Multithreaded Code. I picked these three papers because they all relate to some of the exciting front end work that Intel is doing to build tools that will enable the non-parallel specialist to develop effective parallel applications.

Getting Code Ready for Parallel Execution with Intel Parallel Composer

Parallel Composer is part of Intel’s Microsoft Visual Studio add-on suite for parallel application development called Parallel Studio, which began shipping last May (more on Parallel Studio here). Parallel Composer is where code gets written in Parallel Studio, and it builds directly upon Intel’s existing code development tools. This article provides an overview of the different approaches supported by Parallel Composer for expressing concurrency in applications: OpenMP, C++ Compiler Language Extensions (i.e., __par,__critical, etc.), Threading Building Blocks, Win32 Threading API and Pthreads, Threaded Libraries (like the Intel Math Kernel Library, MKL), auto-parallelization, and auto-vectorization.

In addition to providing a quick overview of each approach along with examples that serve to highlight the type of code resulting from each of the approaches, the paper also provides some quick insight into specific situations where one approach may be preferable to the others to help developers make the right choice. In general, the advice is balanced and honest

As a compiler-based threading method, OpenMP provides a high-level interface to the underlying thread libraries. With OpenMP, the programmer uses OpenMP directives to describe parallelism to the compiler. This approach removes much of the complexity of explicit threading methods, because the compiler handles the details. Due to the incremental approach to parallelism, where the serial structure of the application stays intact, there are no significant source code modifications necessary. A non-OpenMP compiler simply ignores the OpenMP directives, leaving the underlying serial code intact.

With OpenMP, however, much of the fine control over threads is lost. Among other things, OpenMP does not give the programmer a way to set thread priorities or perform event-based or inter-process synchronization.

Curing Thread Imbalance Using Intel Parallel Amplifier

Parallel Amplifier is another component of Parallel Studio. Amplifier builds upon a technology proof of concept that Intel posted at WhatIf.Intel.com some time ago, VTune. VTune is powerful, and despite being hard to use quickly became the most popular download at the site, even for developers within Intel. Amplifier builds on this tool and extends the design to support non-experts, for example incorporating visualization to help developers understand what’s going on with their codes. Amplifier is specifically targeted at improving the performance the portion of an application running on a multicore socket.

This paper covers a specific use case for Parallel Amplifier: finding and fixing application load imbalance. Load imbalances are created when one or more threads have more work to do than the others, leaving some threads sitting idle while others are working.

Intel Parallel Amplifier…assists in fine-tuning parallel applications for optimal performance on multicore processors. Intel Parallel Amplifier makes it simple to quickly find multicore performance bottlenecks and can help developers speed up the process of identifying and fixing such problems. Achieving perfect load balance is non-trivial and depends on the parallelism within the application, workload, and the threading implementation.

This paper presents its concepts in the context of simple-to-understand example code, reasoning through the information provided to the developer by Amplifier in order to develop the critical analysis skills necessary to write efficient multicore code.

The concurrency analysis reveals that the CPU utilization on the same routine is poor (Figure 2) and the application uses 2.28 cores on average (Figure 3). The main hotspot is not utilizing all available cores; the CPU utilization is either poor (utilizing only one core) or OK (utilizing two to three cores) most of the time. The next question is whether there are any load imbalances that are contributing to the poor CPU utilization. The easiest way to find the answer is to select either Function-Thread-Bottom-up Tree or Thread-Function-Bottom-up Tree as the new granularity, as shown in Figure 4.

Using Intel Parallel Inspector to Find Race Conditions in OpenMP-based Multithreaded Code

Parallel Inspector is the focal point for application debugging in Parallel Studio. It is based on Intel Thread Checker, and Intel’s James Reinders has described it to me in the past as a “proactive bug finder.” Inspector will try to find problems that haven’t yet manifested as bugs by looking for patterns that indicate data races, deadlocks, and other usage errors that often don’t appear for long periods after release, or only show up in unpredictable ways. Parallel Inspector is used to debug multithreading errors in applications that use the Win32, Intel Threading Building Blocks or OpenMP threading models. This paper emphasizes a common use case for just one of those models, finding race conditions in OpenMP.

Again the paper is extremely practical, using simple but relevant sample code to motivate a discussion of how to use the tool to find and fix a very common problem encountered by developers of parallel applications. Although the paper focuses on OpenMP it doesn’t actually assume much prior knowledge OpenMP programming, taking the time to explain the basic work sharing construct used in the example code to make the paper relevant even to those just getting started.

In Figure 1, Intel Parallel Inspector identifies the data race errors against the source line where x variable is modified, as well as the next source line with supping up the partial sums for each iteration. These errors are quite evident, as the globally defined variables x and sum are being accessed for read and write from the different threads. In addition, Intel Parallel Inspector produces the ‘Potential privacy infringement’ warning, which indicates that a variable allocated on the stack of the main thread was accessed in the worker threads.

…Once the error report is obtained and the root cause is identified with the help of Intel Parallel Inspector, developers can consider approaches to fixing the problems. General considerations for avoiding data race conditions in parallel OpenMP loops are given below, along with advice about how to fix problems in the examined code.

Summing up

I found these papers, and the whole series in general, to be quite helpful, striking a good balance between brevity and completeness. You won’t walk away from this series with an encyclopedic understanding of any one concept or tool, but then that isn’t the point. The focus in each is on getting to the core of a particular tool or concept, or on solving a particular problem. The papers are all very practically-oriented, adding only enough background and theory to provide some minimum foundation upon which to learn. This approach enables the series to be accessible and relevant to developers intimidated by large documentation sets or who just need enough information to get them started solving a particular problem they are having at a particular time while providing a solid foundation for self-paced learning in more detail.

If you are developing multicore applications on Intel processors, these are worth at least a quick review to familiarize yourself with what’s there. When you find yourself stuck in the future, you’ll know right where to go.

Also posted in HPC Education and Training, HPC Software, Tools | 3 Comments

Book Review: Programming Massively Parallel Processors by Kirk and Hwu

Cover of GPU bookProgramming Massively Parallel Processors: A Hands-on Approach
by David B. Kirk and Wen-mei W. Hwu
Morgan Kaufmann (February 5, 2010)
ISBN 0123814723

I just finished reading the new book by David Kirk and Wen-mei Hwu called Programming Massively Parallel Processors. The generic title notwithstanding, readers should not come to this book expecting one of the highly theoretical and general parallel programming texts that most of us had at least some experience with in grad school. This book is very focused on one thing: teaching readers how to develop parallel applications that perform well on NVIDIA’s GPUs using NVIDIA’s CUDA language.

People learn in different ways, some responding well to a theory-based approach that only eventually gets down to implementation, and others responding well to generalization from the specific. I’m a specific kind of guy as, apparently, are the authors of this book. Kirk and Hwu wrote the book on the premise that learning the specifics of writing high performance code for GPUs with CUDA is a useful way to learn about parallel programming in general. Some suspicion of this point of view is warranted, given that the authors are both affiliated with NVIDIA (Kirk is an NVIDIA Fellow and was until 2009 the company’s chief scientist, and Hwu is principle investigator for the first NVIDIA CUDA Center of Excellence at the University of Illinois at Urbana-Champaign).

However, this book does at least as good a job at teaching general parallel principles through implementation as other, more platform-agnostic, MPI and OpenMP books I’ve read; and being tied to specific hardware gives Programming… at least one advantage those other books haven’t had. Namely, parallel programming on any HPC system is complex and targeted at specific hardware in direct proportion to the degree you care about performance, and it is precisely because it is tied to specific hardware that this book does a good job teaching that lesson alongside the more generally useful patterns for parallel programming. Continue reading »

Also posted in Book Review, Compute, GPUs, HPC Hardware | 8 Comments

Sustaining and disruptive paths to HPC: what it means for startups?

This piece is contributed by Thomas Thurston, President and Managing Director of Growth Sciences International. We last heard from Thomas in October of 2009.

In Newport, Rhode Island on March 16th, pacesetters of the HPC community will assemble as part of a Thought Leader panel to discuss “disruptive” industry trends. Disruption is important to HPC market leaders because few threats are more potent or subversive than “disruptive” ones. Meanwhile to HPC startups, disruption is the harbinger of opportunity.

Much has been written about why disruptive strategies are a good idea for startups. Yet perhaps equally critical for new HPC firms is why “sustaining” strategies — i.e. the opposite of disruptive ones — are a bad idea.

What is disruption?

First, to avoid semantic confusion, “disruption” in this context refers to the phenomenon captured by Professor Clayton Christensen at Harvard. More specifically than just “game changing” or “status quo-breaking” innovation, Christensen uses the term “disruptive” in a precise manner describing innovations that begin with “low-end” or “new-market” origins. From low performance or uninhabited tiers of the market, disruptive innovations increase their performance over time, usurping entrenched competitors in mainstream markets as they gradually move up-market. Instead of a direct attack, disruption is a torpedo from below. Rather, it quietly unscrews the hull from underneath, one bolt at a time, until the competitors break apart and sink.

For example, mainframe computers were disrupted by initially lower cost, lower performing minicomputers. Minicomputers started at the “bottom” performance-wise, but then got better over time, ushering in the eras of Digital Equipment, Prime, Wang and Data General.

Minicomputers with printed circuit board logic were later disrupted by initially lower cost, lower performing CPU-based computers. Such was the triumph of Intel and Microsoft, not to mention IBM, Apple and Compaq. Today, lower cost, lower performance ARM-based mobile platforms (ex. smartphones) are in the early stages of threatening to disrupt traditional X86 CPU-based computing architectures. Will Intel and Microsoft successfully respond? Maybe, maybe not. The point is that, unchecked, disruptive threats can be lethal. Continue reading »

Also posted in Business of HPC | 6 Comments

Getting to exascale: a podcast interview with Intel’s chief supercomputing architect

HPC in 2010 podcast logoThis podcast is part of the exclusive video, audio, and feature series at insideHPC.com called HPC in 2010, a look ahead at the technologies, issues, and opportunities our community will be facing in 2010. In this installment of the series, I talk to Bill Camp, the Chief Supercomputing Architect at Intel.

Bill refers to himself as Mr. Exascale at Intel, and his thinking goes all the way from transistors to software. In this conversation, recorded on the show floor during SC09 in Portland, Bill and I talk about the challenges of getting to exascale, the relationship of exascale technologies to commodity processing, and much more. Is Intel thinking about a return to specialized chips for extreme scale supercomputing? How are we going to build exascale systems that take 20 MW, not 200 MW? What about resiliency? Listen to the show and find out.

More about the HPC in 2010 series

Listen to the show [audio:http://insidehpc.com/media/2010/Intel/BillCampSC09Final.mp3]

Download the show


Also posted in Computing Research, HPC Hardware, HPC People, HPC Software, Podcast | Leave a comment

An exclusive first look at SC10 with general chair Barry Hess

SC10 logoDuring the week of SC09, insideHPC grabbed a minute with Barry Hess, the general chair of SC10, to talk about this year’s conference. Barry is the Deputy Chief Information Officer at Sandia National Laboratories and brings decades of experience in technical computing to SC10, as well as a decade of service to the SCxy community. In this first interview of the SC season, we talk with Barry about what he and his committee have planned for us in New Orleans later this year.

insideHPC:Tell us about SC10: what’s going to be new and different about the conference this year?

Barry Hess: It will be different, but our goal is to keep the momentum going. It has been a very pleasant surprise that, even through the downturn in the economy during 2009, the conference kept its momentum with record attendance and a dynamic, enthusiastic crowd. Certainly our focus stays on the technical program, with the highest quality technical papers.

What will change next year is that we’ll have more space — we’ll have the largest amount of space we’ve ever had, in meeting rooms and in exhibit space. We’ll have 370,000 square feet of exhibit floor, about 70,000 more square feet than we had in Reno (our next biggest venue). A couple things happen when we have that much space. We can make more “islands” for the exhibitors — booths in which attendees can walk all the way around — which gives them more value in terms of visibility. Then we can also do some more creative things, like putting whisper suites right on the exhibit floor, which the exhibitors like. Then we can do more creative things inside the conference center to add more value for the attendees because of the space.

One of things that is special about SC is the communication that happens on the show floor between attendees and exhibitors, but also among the different types of exhibitors. Industry representatives are at the conference to sell their wares, the research groups are there to sell their intellectual property to the industry booths, and you get this great mix of everybody looking for connections with everybody else on the show floor. This is one of the things that makes this a unique conference — exhibitors can talk with customers, but also talk with companies or organizations of which they are customers, and everyone is in one place. And that is one of the things that keeps our attendance up, even in a tough economy. Everyone feels they get the right amount of value from coming here.

insideHPC: From your perspective, what is the heart of the conference?

Hess: The technical program drives everything. It’s the engine that creates opportunity for the exhibits. It’s what drives the value for the education program, and provides the funding for that. So really the technical program is the top thing you have to protect and enhance. You want everybody to go back home and say “Wow! That was the best conference,” and you want them to bring their peers back next year.

insideHPC: What are the thrust areas or areas of special emphasis for the 2010 conference?

Hess: There are three thrusts this year. You can’t go to New Orleans without looking at global climate change, and so climate modeling and all the technologies and software and work that’s being done in that area will be a strong area of emphasis for us. Heterogeneous computing is also an area we are paying special attention to in the SC10 program. And the third area is data intensive applications. That’s been an issue for a long time, but now it’s becoming a driving issue: how do you move large data around, how do you visualize it, and so on.

Those are the three thrusts that this year’s committee feels are really going to drive supercomputing on a national and international scale.

insideHPC: For SC09 there was a big focus on sustainability, and certainly both the technical program and Vice President Gore’s talk brought in the topics of climate change. Do you think this is something that we’ll see continue beyond SC10?

Hess: Definitely. We probably won’t solve the problem for years and years, but how we approach the issue will change over time, and so it will remain an interesting and timely topic for future SC conference.

Speaking generally, I’ve worked with the conference chairs over the past four years in developing thrusts for the conference, and we try to work together from year to year to make sure we don’t have abrupt changes. We will be taking the work that we’re seeing at SC09 and moving that forward to SC10. You’ll see that obviously with climate. With sustainability it won’t be a thrust for SC10, but it will just be part of how we run the business of the conference. Every chair builds on the shoulders of the past chairs, and each year builds on the successes and lessons of the years before.

insideHPC: Each year the conference pushes an area that is just on the cusp of emerging in the HPC community. For example, there were several events around efficient datacenter design during SC09, reflecting the shift in the community that was really just picking up speed during this year. What emerging issues will we see reflected in the SC10 program?

Hess: I think the edge in 2010 will be specifically around the heterogeneous architecture work that is going on. During SC09 the most heavily attended tutorial — in fact, they had to move to the ballroom — was the CUDA tutorial. Yesterday I saw clusters with Atom processors on the exhibit floor. People are really getting very creative as they struggle to create new supercomputers for a variety of new missions where HPC is starting to have a real impact. Areas like search, finance, and cybersecurity.

We are in a time period now where there are a lot of disruptions in technology and the programs, we’ve had a large change in government focus. It’s a very disruptive time, and we’re looking at what that means both to the HPC industry, and to the people that will need to use supercomputing in these new mission areas.

insideHPC:Tell me about you and your history with the conference. It’s a tremendous amount of work and an incredible commitment. Why do it?

Hess: I’ve been involved with the conference committees since 2000, and I was an attendee and exhibitor well before that, all the way back to 1996. My first job with the committee was signs in 2000. I took that job because it exposed me to all aspects of the conference. Being involved on the committee really allows you to build new connections, and strengthen existing connections, with the incredibly smart, incredibly talented people that are a part of this conference. And once you are a part of the committee you really begin to understand the value that the conference provides to the community, and you start to see volunteering on the committee as a service. My organization has been very supportive of my involvement, because they realize the value of this conference to the HPC community and they want to be a part of making sure that continues to happen.

insideHPC: Is getting involved with the SC committees something you would recommend for people just starting their careers, or do they need to be a little more “grown up?” Is it too late to get involved now for SC10?

Hess: We have people very far along in their careers. We have people after their careers, retired, that still come. We do have quite a few new, young people. And the steering committee encourages that as a way to bring in the next generation that will take over the conference tomorrow, and bring in a new perspective today.

It is a little too late to get involved with the SC10 committee, since we are just about 9 months from the conference. But there are opportunities on the SC11 and SC12 committees. If there is a particular area that someone would like to get involved with the call for participation has a listing of all the SC10 area chairs — just send an email to that person. And every part of the program has contact emails on the website. The SC10 chairs will pass your name onto the SC11 and SC12 chairs. Now is a great time to get involved for SC11 and SC12.

Also posted in Events | 1 Comment

Rock Stars of HPC: Bill Kramer

Bill Kramer has spent his career finding, catalyzing, and managing change in HPC. Early in his career he helped field the first, production Unix-based supercomputer, and he has continued to work to design and commission some of the most innovative and successful computers of the past twenty years: during his career he has fielded twenty large supers, 7 of which have been in the top 5 of the Top500. Kramer’s career choices have always drawn him to our community’s leading organizations, places that were changing something fundamental about what it means to be a supercomputer center. But he isn’t about change just for the sake of change: for Kramer it is a way to make sure that he stays fresh, and does the best job he can for the people he is leading, and for the people who use his systems.




He is the kind of leader that the HPC community, and just about everyone else, needs more of: someone focused on service to a community he believes in and on getting the job done for the benefit of all.


Bill KramerToday Bill Kramer is the deputy project director and co-principal investigator for the Blue Waters project at the National Center for Supercomputing Applications (NCSA), at the University of Illinois in Urbana-Champaign. This is ground zero for the first sustained PFLOPS (10+ PFLOPS peak) supercomputing center dedicated to diverse science and engineering; but it’s not really about the computer. Over the past several years Bill and his team have been focused on building the facility and designing a system that, when finally turned on next year, will probably be the largest system for open science in the world. But if you’ve been following what the Blue Waters team has been doing you’ll see that they have taken a radically different approach to the launch of this capability into the community.

Getting the system fielded is only the beginning of their efforts, not the end. The really innovative things that the Blue Waters team are can be seen in their focus on training potential users, evangelizing the machine and its capabilities, and reaching out to new disciplines that should be able to benefit from the capability. In short, they are building a community around the resource: a community of users, architects, administrators, and developers that will work together and support one another once the machine is launched to, hopefully, conduct research that will change the world.

Kramer storing a technology time capsule following S06This is the perfect place for Bill Kramer. In talking with Kramer about his accomplishments, it is clear that he is one of those people who have driven their career paths with a guided purpose. As he describes it, the common thread across all of the places he’s been in his career is that they were all setting the pace for HPC at the time.

William T. C. Kramer, PhD, started his career at the University of Delaware supporting code development for the college of engineering. He helped develop applications and visualize datasets for the college’s various research projects on systems like DEC’s PDP-10 and VAX. Some of this work was on the systems side, working on device drivers and components of the operating system. This was the early days of Unix, and U Delaware was one of the early sites on the ARPANet. This put Kramer in a position to be hands deep in the Unix kernel, making systems work with the new TCP/IP. From there he moved on to systems management, getting exposure to both the human and technical issues in running large systems for scientific users.

After a while at Delaware, Kramer started sending blind resumes out to NASA centers. “I always thought NASA was cool,” Kramer says. NASA Ames was about to field a Cray-2, the first production Unix-based supercomputer in the world, and they needed someone who knew how to run a multi-user computer system and someone with the system experience to make it all work. This was the first of the moves Kramer made into an organization undergoing change. “NASA was building a supercomputing center from the ground up, and it was a very exciting time both in terms of the organization and the technology,” he says.

In fielding the Cray-2, Kramer helped finalize several pieces of software that would eventually become staples in HPC centers around the world, from UNICOS to NQS. Eventually he moved from system engineering to development and then leadership as Ames continued to field supercomputers from Cray (the site actually tried to install an ETA-10, but ended up refusing to accept the system because it never worked). They also started experimenting with MPPs, including TMC and Intel systems, and an early IBM SP. He remembers that one of the big debates they had during his time running the high speed processing group was whether or not to allow interactive editing on the Cray. His position — in favor of interactive editing — eventually won the day, but not for the reasons you’d expect. “We argued that it made more sense in terms of the demand on system resources for users to be able to make small edits to files directly on the Crays, instead of incurring all the overhead of transferring the complete file off and then back on to the system for a small change.”

Kramer was then recruited to NERSC, another organization in the midst of tremendous change. They had just moved from Livermore to Berkeley, and they had set out to become a different kind of supercomputing center. “NERSC was focused on big science — results — rather than on just having lots of users,” he says, and that was a difference that attracted him. NERSC was also one of the first organizations to commit 100% of their production resources (“in with both feet” is how he puts it) to MPP systems in a time when vector was the norm. While at NERSC Kramer contributed to the evolution of the Cray T3E, ultimately becoming Deputy Division Director as he fielded IBM SPs and, most recently, the Cray XT4 known as Franklin, before moving on to NCSA to run the Blue Waters project.

Gates and KramerThroughout all of these very challenging assignments, Kramer has remained dedicated to volunteer service. “These are very symbiotic commitments,” he says. “Certainly the organizations benefit, and I enjoy giving back to the community. But volunteer assignments are a great way to refresh my point of view and to develop new skills that, sometimes, end up helping out professionally.” Kramer says that a lot of what he has learned about managing people has come from experience in volunteer organizations. Over the years he has served in SCUBA organizations and volunteered in schools and community theaters. He also helped start the tutorials effort and graphics special interest group of the Digital User’s Group, and has been active in SIGGRAPH. But people are probably most familiar with his service to the SC conference series, which included a year as General Chair of the Conference in 2005 when he hosted Microsoft Chairman Bill Gates on the stage in Seattle.

“I try very hard to make sure I don’t get staid in my ideas. Volunteering is a great way to learn about yourself, and find new things you like to do that challenge you.”

Kramer is at the point in his career where he has the perspective to identify, and to be proud of, a few key accomplishments. His list is interesting as much for the kinds of items it contains as for the specific items themselves: facilitating the first ab initio turbulence simulation at NASA Ames, and supporting the efforts to return to flight after the Challenger disaster, the first FAA certification of an aircraft change based solely on computation, and discoveries in the search for dark matter.

What is special about this list is that Kramer doesn’t include any of the contributions he made to machines, only to discoveries the machines made possible. Unlike many managers of supercomputing centers, including myself, Kramer has managed to stay connected to the work his machines make possible. “I have always tried to make sure I kept one technical activity to keep me connected to the work that supercomputers make possible.” This, he says, reminds him why he came to supercomputing in the first place, and makes him a better center manager.

In researching this article with Bill’s colleagues and co-workers, I continually received anecdotes of his “boundless energy” and “deep commitment” along with adjectives like “focused”, “tireless”, and “dedicated.” But how does he describe his own contribution? “I think the most value I bring is in making large, complex systems work well so that people can get something done with them.”

And that, in the end, is what an HPC rock star does.


Also posted in Featured HPC Rock Star, HPC People, Rock Stars of HPC | 5 Comments

AMD talks up power innovations in Llano, puts GPU in the CPU socket

This week at the International Solid State Circuits Conference (ISSCC) all of the chip community are gathered together to talk about the latest and greatest research and technology breakthroughs, and not share anything that might dull their competitive advantage. Chip companies also tend to tie their product announcements in with ISSCC, which explains the timing of the Itanium, POWER, and other announcements over the past several days.

AMD logoLate last week I talked with AMD about Llano, their forthcoming laptop platform. That’s right, a laptop chip. So why “waste” your time when this is an HPC blog? Well, the concept is interesting, and I bet the technology that they are jamming into that chip finds its way upstream before too long.

AMD’s marketing angle these days is that we are entering the heterogenous systems era (following the single and multi-core eras), enabled by data parallelism and GPUs. You see this reflected in AMD’s materials when they talk about their “Fusion” strategy. Llano is AMD’s GPU+CPU processor — they call it an APU, or Accelerated Processing Unit. Llano puts 4 x86 (Phenom II) cores plus one GPU (they are mum on the number of cores the GPU itself will have) onto a single die in a 32nm process. The processor supports DDR3 memory (though they aren’t talking how many channels), is DirectX 11 capable, and will be available for sampling in the first half of this year. It is expected to operate at greater than 3 GHz. Continue reading »

Also posted in Compute, HPC, HPC Hardware | Leave a comment

Green HPC Podcast Episode 6: Green technologies of the future

Green HPC Podcast logoThere is a lot of activity — and a lot of hype! — today around the ways in which vendors and large supercomputing centers are trying to reduce their power usage while still getting useful work done. But there is only so much you can do adapting today’s technology, and to truly transform our approach to energy use in HPC we will need new technologies for operations and instrumentation, control system software, operating systems, job schedulers, computational algorithms, chip design, networking, and in many other areas.

In this episode we talk with companies and supercomputing centers at the forefront of thinking today about the new technologies we’ll need tomorrow. In our conversations we touch on the full spectrum of green technologies, from “bits to buildings” as Horst Simon says. On the buildings side of the spectrum we talk with our guests about local power generation and integrated approaches to work scheduling that incorporate knowledge and power rates and datacenter hot spots, integrated monitoring, and allocating user time in kW-hours instead of CPU hours. On the bits side we talk about evolutions of today’s processor architecture, the likelihood of a return to custom processors for HPC, and technologies for the rest of the computer that will provide us both the opportunity — and the challenge — to completely rethink the way we structure algorithms.

This is the final episode of the Green HPC podcast series, and in it we look at what the future may hold.

Listen to Episode 6 and meet the guests or visit the Green HPC Podcast Series home page to learn more about the entire series.


Also posted in Green HPC, Podcast | Leave a comment

A group to call our own: the Society of HPC Professionals

With the exception of a few focused degree programs in computational engineering and some certificate programs in HPC, there really isn’t a lot of structure to the supercomputing community. In part this is a result of where we came from: the people who have really pushed the boundaries of our community have historically been practitioners of other disciplines trying to solve really hard problems with the technology they had available to them. Thus the CFD’er becomes an HPC’er in order to solve the problem she is tackling, and the HPC community accretes another member. As HPC and supercomputing become ever more specialized and complex, however, there is a growing need to share information, best practices, and everyday experiences related to running, using, and building supercomputers as a discipline unto itself.

I first ran into a note about the Society of HPC Professionals on LinkedIn in a post by Bill Menger. It turns out that Bill is serving as president of the new organization, and is one of the forces behind its formation. After we exchanged a few emails, Bill agreed to answer a few questions about the Society for insideHPC’s readers.

insideHPC: You are one of the forces behind the formation of the Society of HPC Professionals. If you had to pick a single major thrust that the Society is designed to address, what would it be? Education? Networking? Something else?

Bill Menger: The single thrust is “user groups.” We are looking to meet a need we’ve heard discussed but never seen anyone meet. We will include education, industry participation, etc., but our primary focus is to help the users of HPC. Also included are the system administrators, directors of operations, etc. They need a SIG (special interest group) as well! We want to partner with Universities and other Educational institutions to help them in training for real needs. There is a lot that can be done in a variety of areas.

iHPC: Where did the idea come from?

Bill: For about 3 years, the SEG (Society of Exploration Geophysicists) has included an HPC Workshop as an add-on at the end of its annual international meeting.  Some acquaintances of mine organized that event, including Keith Gray at BP, Chap Wong at Chevron, and Ebb Pye at Pye Associates.  Also each year for the past few years, Jan Odegard at Rice University has held an “HPC Oil and Gas Workshop” in the spring. I have attended most of these meetings and was involved with the HPC community in Oil and Gas since my job at the time was to direct the HPC for ConocoPhillips.

After organizing several popular breakfast meetings about HPC for the benefit of my peers in the oil companies, word got around and two of the SEG Workshop organizers asked me to have breakfast with them. We met at IHOP and they (Ebb Pye and Gary Crouse) told me that there was a lot of interest in some kind of user group for HPC, and that the interest had been there for a couple of years. What had been discussed, but never acted on, was the formation of a user group. They suggested that I consider forming a group that could create Special Interest Groups (SIGS) for the myriad of topics in HPC that people were coming to the workshops to learn about. They also suggested that if I did it right, there were sponsoring companies who were interested in seeing more organized groups of users who could help communicate needs to them and help them organize their products and services.

I had a passion to see us bring in the new crop of scientists and computer geeks and train them in HPC, so I presented this concept (formation of a non-profit) at my next HPC breakfast meeting, and a subset of that group agreed to be on this committee, which turned into the board of directors of the Society of HPC Professionals. We incorporated as a Texas non-profit and patterned our bylaws so that we can apply for 501-c(6) status as a tax-exempt professional society sometime this year.

We’re planning to hold our first open meeting in late April of this year, and we are holding a sponsorship meeting for members of the HPC ecosystem that may be interested in helping make this happen on the morning of Feb 3rd at the Unique Digital, Inc. headquarters in Houston.

iHPC: The society is growing out of the oil and gas industry HPC user base. Are you going to stay focused on this community, or are you going to welcome members from all communities that use HPC?

Bill: Yes we are coming from oil and gas, but we deliberately set out to be an HPC society for everyone. Our incorporation documents and bylaws allow us to span geographically and cross-industry.

Our goal is to form a society that encompasses all flavors of users, and to create special interest groups (SIG), focus groups, and round-table discussions that target the specific types of user. Our April meeting will explore 5 or 6 areas of interest and we’ll be having audience discussions on each major topic, with one open session for additional topics. For instance, there is a need for the scientist-as-programmer group to get together and a) discuss ideas, b) learn how to “do” programming correctly, c) learn optimization techniques, and d) find out about who provides services that can help them along.

iHPC: What are your expectations for the upcoming meeting on February 3rd and the April user meeting?

Bill: We are keeping the first meeting pretty small — we have only invited who we consider to be “potential early sponsors” of the society. Our goal for this meeting is to inform the commercial community (hardware and software and solutions providers) of what we are doing, and how we are different from other events such as the “HPC Workshop for Oil and Gas” at Rice University and the post-convention workshops at venues like the Society of Exploration Geophysicists. Companies that may be interested in sponsoring the Society can get more details, including venue and contact information, at the meeting website.

We are planning a “real” meeting in late April where we will invite the user community and solicit membership in the society. Once we get seed money to get the Society rolling, we’ll start communicating with our list of names; we have about 2500 people on our list so far, and readers who are interested in staying in touch with the Society as things get going can register at the Society’s website. Our kickoff meeting for the user community is planned late April, and we have two potential venues in Houston so far for that meeting.

Also posted in HPC Education and Training | 2 Comments

Advertisement

Intel Truescale White Paper Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap