Today Tokyo-based Startup XTREME DESIGN announced it has raised $700K of funding in its pre-series A round. Launched in early 2015, the Startup’s XTREME DNA software automates the process of configuring, deploying, and monitoring virtual supercomputers on public clouds.
In this slidecast, Jem Davies (VP Engineering and ARM Fellow) gives a brief introduction to Machine Learning and explains how it is used in devices such as smartphones, autos, and drones. “I do think that machine learning altogether is probably going to be one of the biggest shifts in computing that we’ll see in quite a few years. I’m reluctant to put a number on it like — the biggest thing in 25 years or whatever,” said Jem Davies in a recent investor call. “But this is going to be big. It is going to affect all of us. It affects quite a lot of ARM, in fact.”
“In the past, developers would get best results if a loop was unrolled, that is, duplicating the body as many times as needed to that the operations could be operated on using full vectors. The number of iterations would reflect the hardware that the code was targeted towards. Since the application may have to run on different hardware in the future, results for todays generation of hardware may be compromised in the future. In fact, it is better to let modern compilers to the unrolling.”
The European PRACE initiative has published a Best Practices Guide for GPU Computing. “This Best Practice Guide describes GPUs: it includes information on how to get started with programming GPUs, which cannot be used in isolation but as “accelerators” in conjunction with CPUs, and how to get good performance. Focus is given to NVIDIA GPUs, which are most widespread today.”
Today the Google Cloud Platform announced that it is the first cloud provider to offer the next generation Intel Xeon processor, codenamed Skylake. “Skylake includes Intel Advanced Vector Extensions (AVX-512), which make it ideal for scientific modeling, genomic research, 3D rendering, data analytics and engineering simulations. When compared to previous generations, Skylake’s AVX-512 doubles the floating-point performance for the heaviest calculations. In our own internal tests, it improved application performance by up to 30%.”
Intel DAAL is a high-performance library specifically optimized for big data analysis on the latest Intel platforms, including Intel Xeon®, and Intel Xeon Phi™. It provides the algorithmic building blocks for all stages in data analysis in offline, batch, streaming, and distributed processing environments. It was designed for efficient use over all the popular data platforms and APIs in use today, including MPI, Hadoop, Spark, R, MATLAB, Python, C++, and Java.
Today Allinea Software launched the first update to its well-established toolset for debugging, profiling and optimizing high performance code since being acquired by ARM in December 2016. “The V7.0 release provides new integrations for the Allinea Forge debugger and profiler and Allinea Performance Reports and will mean more efficient code development and optimization for users, especially those wishing to take software performance to new levels across Xeon Phi, CUDA and IBM Power platforms,” said Mark O’Connor, ARM Director, Product Management HPC tools.
“By implementing popular Python packages such as NumPy, SciPy, scikit-learn, to call the Intel Math Kernel Library (Intel MKL) and the Intel Data Analytics Acceleration Library (Intel DAAL), Python applications are automatically optimized to take advantage of the latest architectures. These libraries have also been optimized for multithreading through calls to the Intel Threading Building Blocks (Intel TBB) library. This means that existing Python applications will perform significantly better merely by switching to the Intel distribution.”
Applications that can take advantage of the new vectorization capabilities of the Intel Xeon Phi processor will show tremendous performance gains. “When considering vectorization, there are different tools that can assist the developer in determining where to look further. The first is to look at the optimization reports that are generated by the Intel compiler and then to also use the Vector Analyzer that can give specific advice on what to do to get more vectorization from the code.”
In-Memory Computing can accelerate traditional applications by using a memory first design. Applicable to a wide range of domains, In-Memory Computing and In-Memory Data Grids take advantage of the latest trends in computer systems technology. “In-memory computing is designed to address some of the most critical and real-time task requirements today. This include real-time fraud detection, biometrics and border security and financial risk analytics. All of these use cases require very low latency access to data from very large amounts of data, which results in faster and more accurate decisions.”