The @HPCpodcast’s “Industry View” feature takes on major issues in advanced technologies through the lens of industry leaders. In this episode, we dig into the design and deployment of large-scale AI infrastructures with Jonathan Ha, Senior Director of Product Management for AI at Penguin Solutions. He is a 25-year industry veteran whose career includes stints at Microsoft Azure, AMD and Amazon Web Services.
In this conversation, Ha delivers a master class in AI at scale, examining the extensive list of factors that must be accounted for when organizations plan AI deployments. This ranges from hardware to software along to the skill readiness of the IT staff — all related to AI’s unique demands.
The overall project management goal, of course, is to avoid AI’s high failure rate. This can be done, Ha says, but only if IT managers take the right steps and ensure the project accounts for the critical challenges and hidden landmines that so often bring AI projects down.
Penguin recently announced its OriginAI offering, which includes validated, pre-defined AI architectures that incorporate NVIDIA technology, Penguin’s cluster management software and managed services. OriginAIis designed to streamline AI implementation and management, enabling predictable AI performance from clusters ranging from hundreds to thousands of GPUs.

Jonathan Ha, Penguin Solutions
OriginAI is based on Penguin’s extensive experience working with major organizations successfully design and deploy AI at scale. Ha discusses some of those case histories while explaining that OriginAI maps to major challenges Penguin customers overcame on their paths to AI success.
You can find our podcasts at insideHPC’s @HPCpodcast page, on Twitter, at the OrionX.net blog, on iTunes, and on Google. Here’s the OrionX.net podcast page, and the RSS feed. We’re also available on Spotify and iTunes.