Today Platform Computing rolled out RTM 8 and Platform Analytics 8, which are designed to help administrators monitor, manage and analyze cluster usage. To learn more, I caught up with Louise Westoby, Senior Product Marketing Manager, Platform Computing.
insideHPC: Platform RTM 8 acts as a dashboard for the LSF product family. How does it change the way users currently interact with LSF? In other words, how did they get their work done before RTM came along?
Louise Westoby: Platform RTM is an operational dashboard that provides monitoring and reporting functionality for HPC administrators. It enables administrators to easily monitor and report on resource consumption as well as monitor resource allocation by user, group or project team. Before Platform RTM was available, administrators had to filter through log files and extract the usage data on their own. With Platform RTM, usage data is not only automatically extracted, but also it is done in real-time, providing information on the immediate condition of the cluster.
The ability to monitor cluster availability and performance is imperative when we’re running millions of design simulations to test our latest software releases,” said Steve MacQuiddy, IT Director Engineering Infrastructure, Cadence Design Systems. “Having the single Platform RTM dashboard allows us to simultaneously observe the entire cluster environment and it has not only made it easier for us to better balance our workloads, but it’s also helped us optimize throughput for our critical jobs during peak usage.”
insideHPC: Can you describe how RTM would help administrators to quickly resolve issues?
Louise Westoby: The operational dashboard included with Platform RTM provides visible status indicators, enabling administrators to quickly identify and correct problems. For example, this would include identifying idle capacity and capacity bottlenecks, and with the capability to monitor resource consumption in real-time, administrators can tune Platform LSF scheduling policies to optimize resource utilization, reduce job pending times, and improve user satisfaction and productivity.
insideHPC: Does RTM mask complexity, or does it help to remove complexity?
Louise Westoby: Platform RTM helps eliminate the complexity associated with monitoring and reporting on the status of the workload. Unlike typical monitoring tools that only monitor the infrastructure, Platform RTM is workload and resource-aware, providing full visibility into the utilization of Platform LSF clusters. It provides a single aggregated dashboard to monitor all workload scheduling facets, including global clusters, hosts, jobs, licenses, queues, users and applications. With its broad set of capabilities, Platform RTM can replace multiple tools or home-growth scripts in typical cluster environments with a single easy-to-use, monitoring tool. This results in improved productivity for administrators and users alike as well as reduced cost and complexity.
insideHPC: What capabilities does Platform Analytics enable for system resource planning?
Louise Westoby: Platform Analytics enables managers, planners and administrators to easily correlate massive amounts of historical workload usage data — jobs, resources and license usage data — from one or more Platform LSF clusters for future decision-making. Also, as with traditional analytics tools, external data sources can be easily combined with workload data to provide data views tailored specifically to an organization’s unique requirements, without the need to build intermediate data views. Platform Analytics turns this data into usable information, making it easy to identify changes in usage patterns. By understanding application and hardware utilization over time, planners can make better decisions, intercept trends, consolidate under-utilized assets more quickly, and ensure that spending is efficient and aligned to business needs.
insideHPC: Can you describe a specific use case where Platform Analytics helped a customer with resource planning?
Louise Westoby: Red Bull Racing uses Platform Analytics to track their cluster usage and identify potential problems that might interfere with running design tests, which are typically very time sensitive and critical to the success of the racing team at the next Formula One event. The software enables the design team to plan for peak resource usage at heavy test times so that the design process runs smoothly.
insideHPC: Why is home-grown scripting not a viable option any more for today’s datacenters?
Louise Westoby: Home-grown scripting is still a viable option for today’s datacenters. However, there are two key reasons datacenters are moving away from scripting. First of all, it is time consuming and introduces a significant amount of complexity in the environment. This is where a product such as Platform Analytics provides a particular benefit by automating the data extraction, correlation and analysis, without the need for complex scripts or intermediate steps.
Secondly, In recent years a new generation of HPC users have emerged that are highly skilled in their area of expertise, but are not computer scientists. They learned how to interact with computer systems – from gaming consoles to smart phones to laptops – using an interactive GUI, and therefore require an easy to use interface. This is why products such as Platform Application Center has become a critical part of Platform Computing’s HPC offering. The product’s easy to use interface makes it easier for users to run HPC applications without programming. For example, the included application-specific job submission templates eliminate the need for complex wrapper scripts, significantly reducing the amount of time it takes to integrate the application with Platform LSF while minimizing user errors during job submissions.
insideHPC: Do you have beta customers for these products? What has been their reaction?
Louise Westoby: The Beta program for these products was very successful. Cadence Design Systems and Simulia both participated in the program and, as you can see from the quotes they provided for the press release, were very happy with the results.