Today Ellexus in the UK announced the release of Mistral, a “ground breaking” product for balancing shared storage across a high performance computing cluster. Developed in collaboration with ARM’s IT department, Mistral monitors application IO and cluster performance so that jobs exceeding the expected IO thresholds can be automatically identified and slowed down through IO throttling.
Mistral addresses the noisy neighbor problem: when a small number of jobs overload the network or file system in a compute cluster with shared storage. Sometimes this problem is caused by rogue jobs that have been submitted to the cluster by mistake. Other times the cluster may simply by overloaded with a high number of IO hungry jobs. In either case, the performance of all the jobs on the cluster can be affected and the cluster may even be brought down completely, bringing hundreds of engineers to a standstill and causing critical deadlines to be missed at a cost of tens or hundreds of thousands of pounds.
Mistral works by monitoring application IO and IO performance across a cluster in order to identify rogue jobs and hotspots, and can automatically throttle IO problem jobs and applications. By deploying Mistral, a company can gain an in-depth understanding of how users are accessing the storage as well as preventing disastrous data access patterns.