Reading the Intel Tea Leaves: Pat Gelsinger’s HPC Paradox

Pat Gelsinger

As he takes charge of Intel, CEO Pat Gelsinger faces a paradox: his new company is both troubled and a revenue geyser; if Intel is to continue its historical growth rates, he’ll need the skills of a corporate turnaround artist. These contradictions surely apply to Intel’s position in HPC/AI/data center server processors, where the company is both dominant and under siege from innovative, agile competitors cutting into the company’s 90+-percent market share. In HPC, the company is omnipresent, yet its recent failures at the upper reaches of supercomputing have been high-profile embarrassments. At the product level, the company drives innovation across several technologies critical to HPC (memory, high speed interconnect, API), yet it has run into severe problems in the design and fabrication of its foundational product category: the advanced CPU.

How Gelsinger navigates these quandaries – whether he can reverse what may be a slide down the far side of the corporate life cycle – is a fascinating prospect for Intel watchers and for the HPC community. But taking the long view, having seen Intel recover from past setbacks and accounting for its tremendous financial power, it’s easy to overstate the company’s current problems. Further, regarding its choice of new chief, the consensus at least among industry observers we spoke with is that Gelsinger could have the right mix of technical expertise, personal qualities and business experience that Intel needs. Wall Street, which has pumped up Intel stock 10 percent since the Gelsinger news broke, seems to agree.

Before we get to the Gelsinger selection and what it might mean, a quick historical review: Intel’s challenges have roots going back decades, when Intel CPUs were nearly the only chips in town for HPC clusters (as one industry wag said back then: “If you don’t give people choices, they won’t look for alternatives”).

Intel’s dominance in HPC led naturally to a somewhat sclerotic reinvestment in a cash-cow, CPU-centric business model that knew no rest. But about 12 years ago, with the revelation that Nvidia GPUs, built for gaming, were adept at training AI models, came a growing market restiveness for other architectures capable of complementing or replacing Intel CPUs. Along with Nvidia GPUs, Intel’s HPC hammerlock was broken more recently by a resuscitated AMD, under the turnaround leadership of Lisa Su, which in 2017 announced CPUs seen to offer price-performance advantages over Intel, followed last year by Radeon data center GPUs. AMD also has acquired FPGA maker Xilinx for potential CPU-GPU-FPGA integration, there are growing numbers of domain- and inference-specific AI chips from various vendors, and CPUs from Arm, which Nvidia plans to acquire, are finding acceptance in the data center and in supercomputing along with their established presence at the edge. Call it the big bang in chips – or, in the phrase of HPC industry analyst Addison Snell of Intersect360 Research, “technology disaggregation.”

It wasn’t until 2015, with its acquisition of FPGA specialist Altera, that Intel signaled a grudging turn toward heterogeneity (even as it continued to declare as late as SC15 that Moore’s Law was alive and well). It’s a turn that has grown in conviction and scope as it has pervaded Intel’s market messaging, M&A activity, product portfolio and industry consortium activity. Outgoing CEO Bob Swan did his best to carry the banner: frequently during his short-lived tenure at Intel’s helm he baldly stated Intel was no longer a CPU company but an “xPU” multi-architecture company. This embrace of market realities has been beset by operations problems, years-long product delays on Intel’s 10nm Xeon CPU and 7nm Ponte Vecchio GPU chips that led to failure to deliver, as prime contractor, two supercomputers to Argonne National Laboratory. Both named Aurora, the second of the two systems was scheduled to be the U.S.’s first exascale-class supercomputer later this year; that milestone instead is expected to be achieved by the AMD CPU/GPU-powered Frontier system built by HPE-Cray for Oak Ridge National Laboratory.

On the financial front, even as Intel revenues continue to dwarf Nvidia’s and AMD’s, those two hot-growth companies have become perennial Wall Street sweethearts with skyrocketing stocks, while Intel’s share price has lagged. In fact, according to industry analyst Patrick Moorhead, president of Moor Insights & Strategy, it was pressure from disappointed investors that drove Swan from his post.

“The reality is that Bob Swan was dealt a very difficult hand when he took the reins from (preceding CEO) Brian Krzanich…,” Moorhead said in a podcast with Daniel Newman of Futurum Research. “I think it really came down to investor pressure and impatience. If I look at all of the earnings reports, in aggregate, they weren’t all great, but man, there were a few really great earnings reports, where the company took advantage of what it calls data centric opportunities, which is a combination of data center plus the edge. And from a patience standpoint, the reality is that when a chip company gets into a challenge, many times these are years-long issues.”

Throughout this tumultuous period, Intel has lost a host of senior-level managers with direct bearing on HPC and AI, including Alan Gara, who led development of Intel’s discontinued OmniPath high performance fabric; Jim Keller, former SVP of the Silicon Engineering Group, now at AI startup Tenstorrent; Naveen Roa, corporate VP/GM of Intel’s AI Platforms Group, now CEO and co-founder of Stealth; and Daniel McNamara, formerly Intel president/GM of the Network and Custom Logic Group and SVP of the Programmable (i.e., FPGA) Solutions Group, now at AMD.

This is the hot mess handed to Gelsinger. But Moorhead said Gelsinger could be up to the challenge in part because he, unlike former CFO Swan, has a technical background, and Intel’s problems are primarily technical.

Citing Gelsinger’s record at Intel, where he rose to become CTO (before moving to VMware as CEO), Moorhead said, “He ran what would be called (Intel’s) data center business, Opteron, and I competed head to head with him when I was at AMD. He was a force to be reckoned with. It’s hard to believe he was at Intel for 30 years, but he’s very respected. He’s an engineer’s engineer, and people like him. His morals and values are very, very high. And I think people are cheering at Intel today. And again, nothing against Bob Swan, but I think if nothing else, (Gelsinger) will attract engineers to the company, top engineers. If you’ve never seen a Pat Gelsinger keynote, you really should. It’s very different. It’s kind of nerdy, unapologetically nerdy. And I think that is what Intel needs right now.”

In this vein, Steve Conway, senior adviser, HPC market dynamics at industry analyst firm Hyperion Research, thinks it’s technical background along with business success that led to Intel’s selection of Gelsinger.

“I think the main reasons he was hired is that he was very successful at VMware, doubled the size of the company during his time there,” said Conway. “And beyond that, he had 30 years in at Intel with serious technical accomplishments in the processor arena, himself. So he’s going to be, I think, pretty welcomed by the employees of Intel as somebody who’s technically extremely competent, a real technical leader, plus he has this business experience behind him.”

What about the influence of his VMware experience? Conway said it’s well suited to the increasingly diverse HPC playing field.

“On-premises architectures for HPC are becoming more cloud like…,” Conway said. “Not all HPC on-premises jobs are going to go that way, but I think many of them are going to go that way. What you’re seeing is a trend toward a single HPC job going through a dozen or more lightweight containers because the jobs requirements are getting so much more heterogeneous and diverse. So you might see a single job having to go through simulation runs and analytics runs, multiples of each, and at each stage, this container has to assemble the right resources, the right hardware, the right software, the right data, all the rest, in order to have the job as an end-to-end workflow, complete. That’s something where VMware is a lot about containers and containerization.”

“The general direction (in HPC) is more heterogeneous workflows needing more heterogeneous resources…,” Conway said, “so the VMware experience … is a plus.”

The biggest challenge for Gelsinger, Conway said, is completing the company’s transition not only from CPU to xPU, but also broadening beyond its traditional cash-cow fabs.

“Intel has some temporary execution issues, we all know that,” he said, “but the bigger issue at Intel has to do with their business model, which has been more internally focused, based on building these very expensive fabs and then having to go out and find enough business to keep them busy and get a return on those investments. Now Intel has competition, both on the fab side from TSMC and Samsung, and also on the processor side from companies (Nvidia, AMD) that are fabless and don’t have to make those investments.

“That means Intel has to turn outward more as a company and, and you know, turn itself more toward the marketplace and competition than it historically has. That’s a major transition in any company, to go from from owning a market, in essence, to having to compete, and for very market-specific kinds of capabilities. And I think that Gelsinger has a really superb background for that combination of things. He invented the 486 chip at Intel, he was the main driver of that, so he’s technically very competent, he won’t have any problem understanding the tech guys when they talk to him. But he also has this record of business success with a serious company.”

Another high priority for Gelsinger: reestablishing a track record of reliable execution, said Futurum Research’s Newman.

“I think he’s going to be laser focused on operations,” he said. “I mean, I think Bob (Swan) was too, but I think Pat’s going to continue that, and he’s going to actually accelerate that.”

On the question whether Intel will become fabless, Newman called it a non-starter. “But the idea of the company partnering, in some cases, with fabs that could accelerate their production and increase yield is a reality, based especially upon the diversification of their chipsets and all the things that company is doing. I think he is going to push forward to make sure that supply is not an issue and that the company is able to get to its next process nodes more quickly. Those are areas that if they get even one or two wins…in terms of saying ‘This is what 7 nanometers is coming,’ and then hitting that, ‘This is when 5 nanometers is coming,’ and hitting that… I think hitting those is going to instantly instill investor confidence.

Snell of Intersect360 agrees.

“Intel’s troubles in recent years have had more to do with execution than strategy. These operational issues opened them up to strong competitive pushes from NVIDIA and AMD, and even ARM, especially in HPC. But let’s not forget, Intel still has a dominant position today in CPUs, plus a compelling storage offering with Optane, and Xe GPUs on the horizon. There’s a lot on the line. Gelsinger’s background and history are encouraging, and his mission is clear.”

What about Intel’s future as a supercomputer systems integrator? The influence of his years at VMware, along with HPC’s move toward public clouds and more cloud-like workloads, seems likely to make Gelsinger less inclined to embrace Aurora-like adventures. VMware, after all, is about taking powerful, networked resources and distributing them across multiple users, rather than pointing the power of a supercomputer at “great challenges” hero runs. In the supercomputing sector, Intel will probably stick to its traditional role of providing chips and other technologies that enable supercomputing OEMs to deliver competed systems, particularly in light of the growing complexity, cost and resource demands of supercomputers, increasingly dominated by HPE-Cray and Nvidia, according to Karl Freund, senior analyst, machine learning and HPC, at Moor Insights & Strategy.

“There’s probably room for two system suppliers,” said Freund. “One would be the big iron for capability machines. And ten the other would be rack-and-stack components, which is Nvidia, for capacity machines. Nvidia looks really good there with the DGX and HGX (systems)… Then you’ve got HPE with Cray. I mean, everybody knows what Cray does and they do it really well. So is there room for another player? No, because everything else is going to go cloud, eventually.”

As for Aurora, Freund said, “I thought it was a mistake when it happened, I still think it’s mistaken. Gelsinger, damn smart guy, he’ll conclude the same thing… It was a public embarrassment, a huge public embarrassment, and a distraction.”

A distraction from what? From what Freund believes is Intel’s critically important direction, a highly integrated chiplet strategy.

“I was big fan when I was at AMD of the chiplet strategy, putting GPUs and CPUs on the same package,” he said. “But even beyond that, being able to do high level of integration of a heterogeneous environment, you get huge benefits. It simplifies the programmer’s task because I can unify memory across that complex, I can build in more interconnect integration with that strategy.”

Noting that this would run counter to VMware – “It’s not breaking things up, it’s actually combining things” – Freund said Gelsinger has critical decision coming up as he directs Intel’s strategic execution. And while the chiplet strategy, the “xPU strategy,” has been adopted by Intel, “they need to turn it up to 11,” Feund said.

“How do you take an accelerator like Habana (see “AWS and Intel Announce Gaudi (Habana)-based EC2 Instances for AI Training”) or a GPU like Ponte Vecchio, and Xeons, how do I combine them together so that one plus one equals three? Because that’s what Nvidia is going to do with ARM CPUs, I’m convinced.”

And it’s what AMD is doing with EPYC CPU-Radeon GPU integration powering the upcoming Frontier and El Capitan exascale systems. In this way, Freund said, Intel is not only playing catch up in data center GPUs, it’s playing catch on CPU-GPU integration.

But I do think (Gelsinger) is the right move for Intel,” Freund said. “He’s got his work cut out for him, but I think he’s the right guy for the job.”

Comments

  1. Victotronics says

    “with the revelation that Nvidia GPUs were adept at training AI models”

    Not everything is about AI. A dozen years ago people were using them for plain old CFD and such.