This sponsored post from Lenovo’s Bhushan Desam covers how new HPC tools like Lenovo’s LiCO (Lenovo Intelligent Computing Orchestration) are working to address the growing popularity of AI and to simplify the convergence of HPC and AI.
Artificial Intelligence (AI) is coming for your HPC cluster – and while there are no autonomous robots taking over the data center, some days it might feel that way to cluster administrators. The HPC cluster looks very attractive to the “outside world”, particularly to those who will need performance beyond a single system or workstation. That is, until they try to use it and realize there is a learning curve they have to overcome. AI workloads are well suited for running on a cluster – but is your cluster management ready for AI users?
Many of our Lenovo HPC clients use open source tools for workload management such as SLURM for scheduling, Ganglia for monitoring, and Singularity for secure application containers. These are well known tools to HPC users, and have become part of the everyday workflow for running and managing jobs in a high-performance computing environment. But outside of the HPC community, users likely have never heard of these let alone know how to use them effectively. Accommodating the occasional “technically-inclined” data scientist may be a minor inconvenience. Surviving the flood of new AI users wanting to use the cluster threatens to overrun what traditionally has been a smooth operation.
At SuperComputing ’17, Lenovo announced our software solution to simplify the convergence of HPC and AI, Lenovo Intelligent Computing Orchestration (LiCO). LiCO delivers simple yet powerful GUI-based tools new users are demanding, on top of a validated software stack of open source HPC cluster management. New cluster users gain an easy path to productivity, without significant handholding or disrupting existing users. Put simply, it provides the best of both worlds for bringing HPC and AI together on a single cluster.
For AI users, LiCO provides graphical job templates for users to submit training jobs with AI frameworks on their choice of GPU or CPU infrastructure – with no scheduler commands or software stack configuration required. The user simply chooses a template, picks which AI framework container version to use, points to their data and chooses the compute resources to use before submitting the job to the cluster. Once submitted they can monitor jobs in progress, including logs and graphical training statistics depending on the framework. Users can also manage AI framework versions and shared storage space from within LiCO – all without the need to learn or access the cluster from the command line.
HPC users can use LiCO as well, continue to use command line tools, or use both together from within the GUI. Custom job submission templates can easily be created for HPC workloads, which is particularly useful for occasional users with limited experience at the console. LiCO also provides GUI-based cluster monitoring and queue management, but perhaps its highest value to HPC administrators is reducing the time and effort needed for helping new users.
AI workloads are well suited for running on a cluster – but is your cluster management ready for AI users?
Since SC17, I’ve continually heard from our clients that the number of AI users is quickly growing in their HPC environments. The AI tide has indeed risen in HPC, and admins are struggling to keep their heads above water. Far from just being the next fad, AI has become a must-have initiative for many organizations, and the technical abilities of these users varies widely. In this new era, LiCO helps HPC teams not only survive AI, but thrive with AI and continue to be an innovation and growth engine for their organizations.
At Lenovo, our vision of LiCO is to provide an open and flexible workload management environment for HPC and AI, while expanding the ease-of-use and functionality for both administrators and users. In part 2, I will expand on the capabilities we have recently released in LiCO, what we’ve learned from clients thus far, and look into the exciting future Lenovo has planned for HPC and AI.
Dr. Bhushan Desam is Lenovo’s Director, Global Artificial Intelligence Business. J.J. Falkanger, Sr. Product Manager at Lenovo, also contributed to this article.