The Intel HPC Developer Conference brought together developers from around the world to discuss code modernization in high-performance computing. With AI and Machine learning in the spotlight, insideHPC was on hand to discuss the latest technologies with thought leaders from around the globe.
- Accelerating Machine Learning on Intel Platforms. In this video from the Intel HPC Developer Conference, Ananth Sankaranarayanan from Intel describes how the company is optimizing Machine Learning frameworks for Intel platforms. Open source frameworks often are not optimized for a particular chip, but bringing Intel’s developer tools to bear can result in significant speedups. “Availability of big data coupled with fast evolution of better hardware, smarter algorithms and optimized software frameworks are enabling organizations create unique opportunities for machine learning and deep learning analytics for competitive advantage, impactful insights, and business value. Caffe is one of most popular open source frameworks developed by Berkeley Vision Learning Center (BVLC) for deep learning application for image recognition, natural language processing (NLP), automatic speech recognition (ASR), video classification and other domains in artificial intelligence.” Watch the presentation: Optimizing Machine Learning workloads on Intel Platforms.
- Using Machine Learning to Avoid the Unwanted. In this video from the Intel HPC Developer Conference, Justin Gottschlich, PhD from Intel describes how the company doubling down on Anomaly Detection using Machine Learning and Intel technologies. “As technological trends continue toward systems that require increased scalability and reliability, there is a growing need for accurate and robust anomaly detection and management systems. Systems like massively distributed high performance computing or fleet-wide autonomous vehicle coordination require near flawless anomaly detection and management. Without such systems in place the negative impact could be catastrophic, with impacts ranging from significant monetary losses to the loss of human lives. Unfortunately, today’s state-of-the-art anomaly detection systems do not provide the necessary accuracy or robustness to support such complex systems. In this talk, we present future research directions at Intel Labs using deep learning for anomaly detection and management. We discuss the required machine learning characteristics for such systems, ranging from zero positive learning, automatic feature extraction, and real-time reinforcement learning. We also discuss the general applicability of such anomaly detection systems across multiple domains such as data centers, autonomous vehicles, and high performance computing.” Watch the presentation: Watch the presentation: Using Machine Learning to Avoid the Unwanted.
- Performance Optimization of Deep Learning Frameworks on Modern Intel Architectures. In this video from the Intel HPC Developer Conference, Elmoustapha Ould-ahmed-vall from Intel describes how the company is doubling down to optimize Machine Learning frameworks for Intel Platforms. Using open source frameworks as a starting point, surprising speedups are possible using Intel technologies. “With the availability of high computing capabilities, deep neural networks have become the popular algorithm of choice for image classification, automatic speech recognition, natural language processing, Advanced Driver Assistance System (ADAS), etc. applications. Intel has made significant contributions to an optimized fork of Berkeley Vision Learning Center (BVLC) Caffe and also making extensive contributions to Tensorflow, Theano, Torch, all in the open source. In his talk, he analyzed the performance characteristics of Caffe and TensorFlow, on Intel Xeon Phi x200. Intel Xeon Phi x200 (code named Knights Landing (KNL)) is the latest Intel Many Integrated Core processor.” Watch the presentation: Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* for Xeon Phi Cluster.
- Data analytics, machine learning, and HPC in today’s changing application environment. In this video from the Intel HPC Developer Conference, Franz Kiraly from Imperial College London and the Alan Turing Institute describes why many companies and organizations are beginning to scope their potential for applying rigorous quantitative methodology and machine learning. “During the current data science boom, many companies and organizations that are not the “usual suspects” (IT/Internet/Silicon Valley etc) are beginning to scope their potential for applying rigorous quantitative methodology and machine learning. This talk will explain how solutions desired by such potential customers can look like, how they may differ from the more “classical” consumers of machine learning and analytics, and the arising challenges that current and future HPC development may have to cope with, with stylized case examples from my own work as a data analytics and machine learning consultant.” Watch the presentation: Data Analytics, Machine Learning and HPC in Today’s Changing Application Environment.
- Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures (LBNL). In this video from the Intel HPC Developer Conference, Prabhat from NERSC describes how high performance computing techniques are being used to scale Machine Learning to over 100,000 compute cores. “Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(logn) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for big data analytics challenges in the scientific domain. In this work, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing ∼50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems. In addition, we showcase performance and scalability on the recently released Intel Xeon Phi processor showing that our algorithm scales well even on massively parallel architectures.” Watch the presentation: Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures.
- Scaling Machine Learning Software with Allinea Tools. In this video from the Intel HPC Developer Conference, Mark O’Connor from Allinea Software describes how the company’s performance optimizations tools can speed up machine learning code. “The majority of deep learning frameworks provide good out-of-the-box performance on a single workstation, but scaling across multiple nodes is still a wild, untamed borderland. This session follows the story of one researcher trying to make use of a significant compute resource to accelerate learning over a large number of CPUs. Along the way we note how to find good multiple-CPU performance with Theano* and TensorFlow*, how to extend a single-machine model with MPI and optimize its performance as we scale out and up on both Intel Xeon and Intel Xeon Phi architectures. Finally, we address the greatest question of our time: how many CPUs does it take to learn Atari games faster than a 7 year-old child?” Watch the presentation: Scaling Deep Learning.
- Pikazo: Deep Neural Network Art on Intel Architecture. In this video from the Intel HPC Developer Conference, Noah Rosenberg and Karl Stiefvater from Pikazo describe the company’s innovative Pikazo App for smartphones. Pikazo was developed in 2015 using neural style transfer algorithms. It is a collaboration between human, machine, and our concept of art. It is a universal art machine that paints any image in the style of any other, producing sometimes-beautiful, sometimes-disturbing, always-surprising artworks. “Pikazo allows novice artists to create impressive imagery via a technique known as neural style transfer. Neural style is a very uncommon problem set for computation, using a detection network to actually generate images. Common methods involve large pre-trained networks, with functional results delivered via feed-forward processes running on GPU systems with relatively low RAM. Our implementation for performing neural style transfer of artistic images requires a novel sampling of the network data as it is being calculated, which requires exceptional amount of compute and RAM availability. We’ll cover our implementation, the difference between CPU and GPU, how to implement training live at scale, and what future applications may be in store.” Watch the presentation: Deep Neural Network Art.
- Video: Supermicro Showcases Machine Learning Solutions on Intel Architecture. In this video from the Intel HPC Developer Conference, Akira Sano from Supermicro describes the company’s Machine Learning Solutions on Intel Architecture. “Our server systems, subsystems and accessories are architecturally designed to provide high levels of reliability, quality and scalability, thereby enabling our customers benefits in the areas of compute performance, density, thermal management and power efficiency to lower their overall total cost of ownership.”