Video: Profiling Python Workloads with Intel VTune Amplifier

Paulius Velesko from Intel gave this talk at the ALCF Many-Core Developer Sessions. “This talk covers efficient profiling techniques that can help to dramatically improve the performance of code by identifying CPU and memory bottlenecks. Efficient profiling techniques can help dramatically improve the performance of code by identifying CPU and memory bottlenecks. We will demonstrate how to profile a Python application using Intel VTune Amplifier, a full-featured profiling tool.”

Are Platform Configuration Problems Degrading Your Application’s Performance?

The Intel VTune™ Amplifier Platform Profiler on Windows* and Linux* systems shows you critical data about the running platform that help identify common system configuration errors that may be causing performance issues and bottlenecks. Fixing these issues, or modifying the application to work around them, can greatly improve overall performance.

Accelerated Python for Data Science

The Intel Distribution for Python takes advantage of the Intel® Advanced Vector Extensions (Intel® AVX) and multiple cores in the latest Intel architectures. By utilizing the highly optimized Intel MKL BLAS and LAPACK routines, key functions run up to 200 times faster on servers and 10 times faster on desktop systems. This means that existing Python applications will perform significantly better merely by switching to the Intel distribution.

Latest Intel Tools Make Code Modernization Possible

Code modernization means ensuring that an application makes full use of the performance potential of the underlying processors. And that means implementing vectorization, threading, memory caching, and fast algorithms wherever possible. But where do you begin? How do you take your complex, industrial-strength application code to the next performance level?

Learn What to Do Next with Intel VTune Amplifier Application Performance Snapshot

Tuning code has, for a long time, been an art. Knowing what to look for and how to correct inefficiencies in serious numerical computations has not been easy for most programmers. It’s often hard to even know which tool to start with. Which is why the Intel® VTune™ Amplifier Application Performance Snapshot could prove to be a great way to get an instant summary of an application’s performance characteristics and issues.

Intel Parallel Studio XE 2018 Released

Intel has announced the release of Intel® Parallel Studio XE 2018, with updated compilers and developer tools. It is now available for downloading on a 30-day trial basis. ” This week’s formal release of the fully supported product is notable with new features that further enhance the toolset for accelerating HPC applications.”

The Internet of Things and Tuning

“Understanding how the pipeline slots are being utilized can greatly increase the performance of the application. If pipeline slots are blocked for some reason, performance will suffer. Likewise, getting an understanding of the various cache misses can lead to a better organization of the data. This can increase performance while reducing latencies of memory to CPU.”

The OpenMP API Celebrates 20 Years of Success

OpenMP is a good example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a standard that benefits the entire developer community. Today, most software vendors track OpenMP advances closely and have implemented the latest API features in their compilers and tools. With OpenMP, application portability is assured across the latest multicore systems, including Intel Xeon Phi processors.

Intel® VTune™ Amplifier Turns Raw Profiling Data Into Performance Insights

Discovering where the performance bottlenecks are and knowing what to do about it can be a mysterious and complex art, needing some very sophisticated performance analysis tools for success. That’s where Intel® VTune™ Amplifier XE 2017, part of Intel Parallel Studio XE, comes in.