Moab Powers Dynamic Resource Sharing at HPC4Health in Canada

Print Friendly, PDF & Email

hpc4healthToday Adaptive Computing announced that they have fully deployed the Moab 8.1 workload manager at the HPC4Health consortium in Canada.

HPC4Health currently comprises the Hospital for Sick Children and University Health Network’s Princess Margaret Cancer Center. As part of a larger vision that also includes Compute Canada and Compute Ontario, HPC4Health has a mission to bring multiple organizations together to share resources dynamically, securely and equitably. Enter Moab, which has been deployed for its elastic computing, advanced policies and accounting capabilities to deliver on this vision.

“In the beginning, our vision was not possible because technology did not exist,” comments Jorge Gonzalez-Outeirino, Ph.D., Facility Manager at the Centre for Computational Medicine at SickKids. “The folks at Adaptive Computing helped us create the technology to build a converged data center that dynamically shares resources securely and allows us to account for the workloads used by each organization involved in the HPC4Health venture.”

SickKids is recognized as one of the world’s foremost pediatric health-care institutions and is Canada’s leading center dedicated to advancing children’s health through the integration of patient care, research and education. The Princess Margaret Cancer Centre has achieved an international reputation as a global leader in the fight against cancer and delivering personalized cancer medicine. It is a member of UHN, the largest hospital-based research program in Canada, with major research in cardiology, transplantation, neurosciences, oncology, surgical innovation, infectious diseases, genomic medicine and rehabilitation medicine. Today’s research discovery and innovation is made possible by not only experiments in the laboratory, but also through computational simulation. With the power of High Performance Computing (HPC) and Big Data Sciences, analyzing and interpreting the terabytes of data generated every day contributes to scientific discovery. Without access to such computational resources, it is impossible to deliver a high-level of individualized patient care, make scientific discoveries and save lives.

Canadian healthcare organizations have lacked the HPC infrastructure needed to push research and personalized clinical care into the future. To address this problem, SickKids and UHN’s Princess Margaret Cancer Centre have partnered to build a pilot IT infrastructure that will provide researchers and clinicians with secure cloud-computing services, while satisfying personal health information privacy requirements. The infrastructure currently resides on 340 SGI compute nodes, 13,024 compute threads, 52.7 terabytes of RAM, 306 terabytes of total local disk space and 4 PB of storage. Part of this infrastructure is Adaptive Computing’s Moab. Moab was selected for its elastic computing features; advanced policies such as: auto enforcement of Service Level Agreements (SLAs), dynamic provision of virtual resources and job arrays; as well as accounting capabilities.

We had many requirements going into this project but the big features were to maintain the perception of managing our own environments and have the ability to handle burst workload requirements,” says Carl Virtanen, Bioinformatics Manager at the Princess Margaret Cancer Center and Associate Director of the HPC4Health. “With Moab it feels like we have infinite resources to handle all of our peak workloads. We manage our own environments as if we were simply a node on the network and with Moab’s heterogeneous capabilities, we can maintain all of our systems we’ve come to rely on to cure cancer and save lives.”

The HPC4Health IT Infrastructure is configured as a single pool of resources with each organization having dedicated resources plus a common communal pool of resources. Each organization and their Admins manage their dedicated resources just as if it were a private data center. As workloads increase, Moab automates each organization’s growth requirements and dynamically obtains additional resources from the communal pool to handle the peak loads and then relinquish those resources back to the communal pool for the next peak workload requirement from any organization. All workloads are tracked per user/organization and accounted for with extensive reporting capabilities. This is made possible through Moab’s elastic computing, advanced policies and accounting features.

Elastic Computing

Administrators from both SickKids and UHN’s Princess Margaret Cancer Centre must ensure that regularly scheduled workloads are completed, particularly during peak times. Each organization manages many users with countless needs and the requirement to be responsive to those needs is imperative; therefore, the ability to burst workloads to other resources is extremely important.

Moab tackles these challenges with elastic computing, which allows Admins to efficiently manage resource expansion by bursting to private clouds or other data center resources utilizing OpenStack. Elastic computing is triggered when a threshold set in Moab is exceeded. To determine this threshold, Moab surveys the system workload and calculates the combined completion time of these burstable workloads if no other workloads are running. Elastic computing bursts workloads, on an as-needed basis, into a communal pool of data center resources and then relinquishing these resources back to the shared pool. Using Openstack, Moab completely wipes each resource after use to help comply with Canadian privacy regulations. This added flexibility enables Admins to expand their own cluster while taking advantage of the elasticity of resources and scalability of the cloud.

Advanced Policies

Some of Moab’s advanced policies, such as auto enforcement of Service Level Agreements (SLAs), dynamic provision of virtual resources and job arrays, are key to the success of HPC4Health’s converged infrastructure.

  • Auto SLA enforcement schedules and adjusts workloads to consistently meet service guarantees and business priorities so the right workloads are completed at the optimal times. Including:
    • Resource sharing and usage policies schedule resources across users, groups and projects in line with resource sharing agreements such as usage limits, usage access controls, and dynamic fairshare policies
    • SLA and priority polices ensure the highest priority workloads are processed first, such as quality of service and hierarchical priority weighting
    • Continuous plus future scheduling ensures priorities and guarantees are proactively met as conditions and workload levels change (Future reservations, priorities, and pre-emption)
  • Dynamic Provisioning discovers that the current level of resources will not meet a given SLA, then reaches out to a provisioning tool that has access to the communal pool of virtual resources. The resources are allocated and then provisioned to match the needed environment. When the workload is complete the added resources are returned to the communal pool (de-provisioned and removed from the workload manager)
    • Job Arrays support the submission of many sub-jobs that perform the same work using the same script, but operate on different sets of data.
    • Accounting

Usage accounting and budget enforcement enables tracking of resource usage as well as the setting and enforcement of usage budgets by user, group, project or any custom organizational hierarchy. Resources are scheduled against that budget for a given period of time including dynamic usage reports and a flexible conditional usage cost/charge structure. This allows HPC4Health to track usage for each organization and then each organization can further track internal usage by user, department or group.

HPC4Health has been an amazing project to work on pushing us to expand the bounds of cloud technology,” says Marty Smuin, CEO of Adaptive Computing. “HPC is the future for IT organizations to analyze big data requirements and extract the necessary data to make game changing decisions. By creating a converged infrastructure of Cloud, HPC and Big Data with Moab, SickKids and UHN’s Princess Margaret Cancer Center have the resources necessary to save lives!”

HPC4Health is made possible from financial contributions by Canadian Foundation for Innovation, Compute Canada, Compute Ontario, The Hospital for Sick Children, SickKids Foundation, The Princess Margaret Cancer Foundation, and University Health Network. HPC4Health is currently in conversations with other Canadian hospitals and health institutions to expand this venture and to take advantage of the HPC cloud resources to further scientific discovery.

Sign up for our insideHPC Newsletter