While HPC developers worry about squeezing out the ultimate performance while running an application on dedicated cores, Intel TBB tackles a problem that HPC users never worry about: How can you make parallelism work well when you share the cores that you run upon?” This is more of a concern if you’re running that application on a many-core laptop or workstation than a dedicated supercomputer because who knows what will also be running on those shared cores. Intel Threaded Building Blocks reduce the delays from other applications by utilizing a revolutionary task-stealing scheduler. This is the real magic of TBB.
In this week’s Sponsored Post, Nicolas Dube of Hewlett Packard Enterprise outlines the future of HPC and the role and challenges of exascale computing in this evolution. The HPE approach to exascale is geared to breaking the dependencies that come with outdated protocols. Exascale computing will allow users to process data, run systems, and solve problems at a totally new scale, which will become increasingly important as the world’s problems grow ever larger and more complex.
Each year the OpenFabrics Alliance (OFA) hosts an annual workshop devoted to advancing the state of the art in networking. “One secret to the enduring success of the workshop is the OFA’s emphasis on hosting an interactive, community-driven event. To continue that trend, we are once again reaching out to the community to create a rich program that addresses topics important to the networking industry. We’re looking for proposals for workshop sessions.”
“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”
Remote visualization tools allow employees to dramatically improve productivity by accessing business-critical data and programs regardless of their location. Remote visualization technologies allow users to launch software applications on the server side and display the results locally, letting them leverage the bandwidth and compute power of the cluster while circumventing the latency and security risks of downloading large amounts of data onto their local client.
With modern processors that contain a large number of cores, to get maximum performance it is necessary to structure an application to use as many cores as possible. Explicitly developing a program to do this can take a significant amount of effort. It is important to understand the science and algorithms behind the application, and then use whatever programming techniques that are available. “Intel Threaded Building Blocks (TBB) can help tremendously in the effort to achieve very high performance for the application.”
Applications such as machine learning and deep learning require incredible compute power, and these are becoming more crucial to daily life every day. These applications help provide artificial intelligence for self-driving cars, climate prediction, drugs that treat today’s worst diseases, plus other solutions to more of our world’s most important challenges. There is a multitude of ways to increase compute power but one of the easiest is to use the most powerful GPUs.
“The Intel Omni-Path Architecture is an example of a networking system that has been designed for the Exascale era. There are many features that will enable this massive scaling of compute resources. Features and functionality are designed in at both the host and the fabric levels. This enables very large scaling when all of the components are designed together. Increased reliability is a result of integrating the CPU and fabric, which will be critical as the number of nodes expands well beyond any system in operation today. In addition, tools and software that have been designed to be installed and managed at the very large number of compute nodes that will be necessary to achieve this next level of performance.”
Here’s a recap of SC16 announcements from Intel that are designed to provide even more powerful capabilities to address HPC challenges like energy efficiency, system complexity, and the ability for simplified workload customization. In supercomputing, one size certainly does not fit all. Intel’s new and updated technologies take a step forward in addressing these issues, allowing users to focus more on their applications for HPC, not the technology behind it.
Libraries that are tuned to the underlying hardware architecture can increase performance tremendously. Higher level libraries such at the Intel Data Analytics Acceleration Library (Intel DAAL) can assist the developer with highly tuned algorithms for data analysis as well as machine learning. Intel DAAL functions can be called within other, more comprehensive frameworks that deal with the various types of data and storage, increasing the performance and lowering the development time of a wide range of applications.