Excelero Powers AI as a Service with Shared NVMe at InstaDeep

Print Friendly, PDF & Email

Today Excelero announced that its NVMesh software is being to used AI-as-a-Service at InstaDeep.

InstaDeep offers a pioneering AI as a Service solution enabling organizations of any size to leverage the benefits of AI and Machine Learning (ML) without the time, costs and expertise required to run their own AI stacks. Excelero’s NVMesh, in turn, allows InstaDeep to access the low-latency, high-bandwidth performance that is essential for running customer AI and ML workloads efficiently – and gain the scalability vital to InstaDeep’s own rapid growth.

Finding a storage infrastructure that would scale modularly and was highly efficient for AI and ML workflows is no small challenge,” explained Amine Kerkeni, Head of AI Product at InstaDeep. “Our clients simply will not achieve the performance they need if an infrastructure starves the GPUs with slow storage or wastes time copying data to and from systems. Excelero NVMesh ticked all the boxes for us and more.”

By allowing the GPU optimized servers to access remote scalable, high-performance NVMe flash storage drives as if they were local flash – with full IOPs and bandwidth capabilities, the InstaDeep team is achieving highly efficient usage of the GPUs themselves and the associated NVMe flash. The end-result is higher ROI, easier workflow management and faster time to results.

InstaDeep’s first Excelero system includes a 2U Boston Flash-IO Talyn server with Micron NVMe flash and Excelero NVMesh software that provides access to up to 100TB external high-performance storage. Leveraging the Mellanox 100GB Infiniband network cards in the DGX, the GPUs use the NVMe storage with local performance. The ability to choose any file system to run on NVMesh was an immense benefit. Early tests immediately indicated that external NVMe storage with Excelero gives equal or better performance than local cache in the NVIDIA DGX.

The GPU systems powering the AI and ML explosion have an amazing appetite for data, but many organizations are finding they quickly create a storage bottleneck,” explained Yaniv Romem, Excelero’s CTO. “The only storage that is fast enough to keep up with these GPUs is local NVMe flash, due to the high competition for valuable PCIe connectivity amongst GPUs, networking and storage. Excelero’s NVMesh eliminates the need to compromise between performance and storage functionality by unifying remote NVMe devices into a logical block pool that performs the same as local NVMe flash with the ability to easily share data and protect it.”

Sign up for our insideHPC Newsletter