Archives for July 2016

Video: Platform Independent Profiling of a QCD Code

“We present a procedure of implementing the intermediate profiling for openQCD code that will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator, which allows for fast and accurate performance predictions of the codes on HPC architectures. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated.”

Enter Your Machine Learning Code in the Cognitive Cup

“OpenPOWER is all about creating a broad ecosystem with opportunities to accelerate your workloads. For the Cognitive Cup, we provide two types of accelerators: GPUs and FPGAs. GPUs are used by the Deep Learning framework to train your neural network. When you want to use the neural network during the “classification” phase, you have a choice of Power CPUs, GPUs and FPGAs.”

Job of the Week: Computer & Information Analyst at University of Cincinnati

The University of Cincinnati is seeking a Computer & Information Analyst in our Job of the Week.

Video: HPE Enhances Software Stack for High Performance Computing

In this video from ISC 2016, Dave Sundstrom from Hewlett Packard Enterprise describes the newly enhanced HPE Software Stack for High Performance Computing. “The HPE Core HPC Software Stack is a complete software set for the creation, optimization, and running of HPC applications. It includes development tools, runtime libraries, a workload scheduler, and cluster management, integrated and validated by Hewlett Packard Enterprise into a single software set. Core HPC Stack uses the included HPC Cluster Setup Tool to simplify and speed the installation of an HPC cluster built with HPE servers.”

Offloading vs Native Execution on Intel Xeon Phi Coprocessors

“Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking. Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.”

Podcast: UT Chancellor William McCraven on What Makes TACC Successful

“It’s great to have these incredible servers and incredible processors, but if you don’t have the people to run them – if you don’t have the people that are passionate about supercomputing, we would never get there from here.”Behind all of this magnificent technology are the fantastic faculty, researchers, interns, our corporate partners that are part of this, the National Science Foundation, there are people behind all of the success of the TACC. I think that’s the point we can never forget.”

Supermicro Now Shipping Intel Xeon Phi Systems with Omni-Path

“With our latest innovations incorporating Intel Xeon Phi processors in a performance and density optimized Twin architecture and 100Gbps OPA switch for high bandwidth connectivity, our customers can accelerate their applications and innovations to address the most complex real world problems.”

Video: Matching the Speed of SGI UV with Multi-rail LNet for Lustre

Olaf Weber from SGI presented this talk at LUG 2016. “In collaboration with Intel, SGI set about creating support for multiple network connections to the Lustre filesystem, with multi-rail support. With Intel Omni-Path and EDR Infiniband driving to 200Gb/s or 25GB/s per connection, this capability will make it possible to start moving data between a single SGI UV node and the Lustre file system at over 100GB/s.”

The Industrialization of Deep Learning – Intro

Deep learning is a method of creating artificial intelligence systems that combine computer-based multi-layer neural networks with intensive training techniques and large data sets to enable analysis and predictive decision making. A fundamental aspect of deep learning environments is that they transcend finite programmable constraints to the realm of extensible and trainable systems. Recent developments in technology and algorithms have enabled deep learning systems to not only equal but to exceed human capabilities in the pace of processing vast amounts of information.

FlyElephant 2.0 Improves HPC Collaboration Features

Today the FlyElephant team announced the release of the FlyElephant 2.0 platform for High Performance Computing. Versioin 2.0 enhancements include: internal expert community, collaboration on projects, public tasks, Docker and Jupyter support, a new file storage system and work with HPC clusters.