Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


How to Control the AI Tsunami

This sponsored post from Lenovo’s Bhushan Desam touches on the impending “AI tsunami” and how to ride the wave of AI and HPC using tools like Lenovo Intelligent Computing Orchestration (LiCO). 

AI and HPC

Dr. Bhushan Desam, Lenovo’s Director, Global Artificial Intelligence Business

When we announced Lenovo Intelligent Computing Orchestration (LiCO) at SC17, I had the opportunity to discuss Artificial Intelligence (AI) with many of the HPC attendees. At that time, most seemed to have an “AI tsunami” lurking on the horizon but did not immediately recognize it. New users were starting to come to them from the most unlikely of departments wanting to use their clusters – and primarily it was to run AI. These users knew nothing about a cluster, the admins knew little about AI – and it was time consuming for both sides.

Fast forward one year and the AI wave has come onshore, not only in research institutions but in enterprises as well. It seems every organization has AI initiatives, and activities quickly graduate from small laptop experiments to needing GPU-enabled workstations to run larger models and datasets in a reasonable amount of time. Many of these projects will grow further and to need scale beyond a single system, so it’s wise to find a solution that will efficiently handle many users, with many projects, and the ever-growing thirst for performance.

Is there an architecture that can handle multiple users simultaneously, running relatively short-running jobs that need high performance infrastructure, with the ability to run multiple jobs on a single machine but also distribute jobs across multiple systems? Since you are reading insideHPC, you already know the answer. This is what we do every day in HPC. The new part is the users, and AI users span a wide range from true data scientists to those working at university art departments. The common theme from these users is they demand simple, intuitive tools to do their work – and this is where we at Lenovo are focused with LiCO.

At its core, LiCO provides a graphical user interface (GUI) to simplify cluster job submission, job monitoring, and storage management, leveraging SLURM as the scheduler to provide a tremendous amount of open source flexibility. This works equally well for HPC and AI, and those who still prefer command line can use it through “expert mode” within LiCO. Standard templates are provided for HPC applications and AI Frameworks, and custom templates can easily be created for nearly any application. LiCO uses Singularity containers to package AI frameworks, so users can easily add new framework versions into LiCO and take advantage of the latest optimizations immediately. This resonates well with data scientists who simply want to bring their code and data to the cluster, run and monitor training, and do so with minimal effort.

We’ve learned from our many client engagements in the Lenovo AI Innovation Centers that continuing to make the experience simple and effective is critical to their success.

Clients tell us there is a wide range of users beyond data scientists that want to get in on the AI action as well, so we recently updated LiCO with new “Lenovo Accelerated AI” training and inference templates. These templates allow users to simply bring their dataset into LiCO and request cluster resources to train models and run inference without coding. These new templates include Image Classification, Segmentation, Object Detection, as well as Natural Language Processing. LiCO is now supported on Lenovo’s new SR670 4 GPU server, a combination which makes an outstanding, scalable AI training solution.

Our vision for LiCO is to be an open platform for HPC and AI – open to both hardware and software innovation – while expanding the value of using the solution. We’ve learned from our many client engagements in the Lenovo AI Innovation Centers that continuing to make the experience simple and effective is critical to their success. In the near term we look to simplify access to the ecosystem around AI, particularly BigData and DevOps so training with LiCO and Lenovo HPC clusters can be integrated into software development toolchains as these applications move towards production. We also recognize clients have multiple methods they use from when they start developing to where they land in production in both HPC and AI, so portability across hybrid architectures will be something we will look to address.

Customer collaborations will play an important role in LiCO’s future, as we work together with clients to address new opportunities and increase value in both AI and HPC. And of course, Lenovo will continue to support HPC community initiatives such as OpenHPC, the Linux foundation, MPI, xCAT and Confluent which are foundational to the LiCO environment. We at Lenovo look forward to showing you our latest innovations in HPC and AI at SC18!

Dr. Bhushan Desam is Lenovo’s Director, Global Artificial Intelligence Business. J.J. Falkanger, Sr. Product Manager at Lenovo, also contributed to this article. 

Leave a Comment

*

Resource Links: