HPC News Bytes 20240318: EU’s AI Act, Cerebras’s Whopping AI Chip, Meta’s Massive AI Infrastructure, a Matrix-Multiply Advance?

A happy St. Patrick’s Day week to you! Here’s a speed-walk (6:10) through recent news in the world of HPC-AI, including: the EU’s European AI Act, Cerebras’s new 5nm Wafer Scale Engine-3 AI….

Rice Univ. Researchers Claim 15x AI Model Training Speed-up Using CPUs

Reports are circulating in AI circles that researchers from Rice University claim a breakthrough in AI model training acceleration – without using accelerators. Running AI software on commodity x86 CPUs, the Rice computer science team say  neural networks can be trained 15x faster than platforms utilizing GPUs. If valid, the new approach would be a double boon for organizations implementing AI strategies: faster model training using less costly microprocessors.

Intel MKL Compact Matrix Functions Attain Significant Speedups

The latest version of Intel® Math Kernel Library (MKL) offers vectorized compact functions for general and specialized matrix computations of this type. These functions rely on true SIMD (single instruction, multiple data) matrix computations, and provide significant performance benefits compared to traditional techniques that exploit multithreading but rely on standard data formats.

Intel MKL Speeds Up Small Matrix-Matrix Multiplication for Automatic Driving

Certain applications, such as automated driving, require low latency small matrix-matrix multiplication in real time. They use specialized libraries that can be customized for small matrix operations. Recompiling and linking those libraries with the highly optimized DGEMM routine in the Intel® Math Kernel Library 2018 can give speedups many times over native libraries.

Deep Learning Frameworks Get a Performance Benefit from Intel MKL Matrix-Matrix Multiplication

Intel® Math Kernel Library 2017 (Intel® MKL 2017) includes new GEMM kernels that are optimized for various skewed matrix sizes. The new kernels take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and achieves high GEMM performance on multicore and many-core Intel® architectures, particularly for situations arising from deep neural networks..