ABCI is a low-cost, light weight “warehouse” with a double-structured sesign including internal scaffolding for racks and cooling pods.
In this video from the MVAPICH User Group, Shinichiro Takizawa from AIST presents: AI Bridging Cloud Infrastructure (ABCI) and its communication performance.
AI Bridging Cloud Infrastructure (ABCI) is the world’s first large-scale Open AI Computing Infrastructure, constructed and operated by National Institute of Advanced Industrial Science and Technology (AIST), Japan. It delivers 19.9 petaflops of HPL performance and world’ fastest training time of 1.17 minutes in ResNet-50 training on ImageNet datasets as of July 2019. ABCI consists of 1,088 compute nodes each of which equipped with two Intel Xeon Gold Scalable Processors, four NVIDIA Tesla V100 GPUs, two InfiniBand EDR HCAs and an NVMe SSD. ABCI offers a sophisticated high performance AI development environment realized by CUDA, Linux containers, on-demand parallel filesystem, MPI, including MVAPICH, etc. In this talk, we focus on ABCI’s network architecture and communication libraries available on ABCI and shows their performance and recent research achievements.
Shinichiro Takizawa, Ph.D is a senior research scientist of AI Cloud Research Team, AI Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Japan. His research interests are data processing and resource management on large-scale parallel systems. He also works as a member of AI Bridging Cloud Infrastructure (ABCI) operation team and designs future ABCI services. Shinichiro Takizawa received Ph.D in Science from Tokyo Institute of Technology in 2009.