Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


2nd Generation Intel® Xeon® Scalable Processors Demonstrate Amazing HPC Performance

In this guest article, our friends at Intel discuss how benchmarks show key workloads average 31% better on Intel Xeon Platinum 9282 than AMD EYPC “Rome” 77421

Intel analysis provides strong evidence that the 2nd Generation Intel Xeon Scalable Processor (Cascade Lake “CLX”) architecture provides dramatic performance for real-world workloads. An impressive array of benchmarks shows 2S systems built with Intel’s 56 core processors (Intel Xeon Platinum 9282 processor) are solidly ahead of systems built with AMD’s 64 core processors (AMD EPYC 7742). In this top of the line match-up, the benchmarks show Intel holding a substantial lead on many HPC workloads.1

112 x86-64 Intel cores beat 128 x86-64 AMD cores by an average 31% on key workloads.1

Intel Xeon Platinum 9200 processors are optimized for both density and performance. Intel Xeon Platinum 9200 processors feature up to 56 cores, 12 memory channels (at 2933MT/s), and high-speed interconnect capabilities via 80 PCIe Gen3 lanes per node.

Intel Xeon Platinum 9200 processors included in the Intel Server System S9200WK deliver amazing performance in a density optimized solution ideal for HPC and AI.

The 2nd Generation Intel Xeon Scalable Processor architects can be congratulated given their performance against competition. But, that is only part of the story here. Intel has its sights on far more than just processor performance.

Continued System Performance Leadership required Breaking Down Barriers

Despite this healthy lead of 31%, it is clear that Intel is not singularly focused on processor performance. It is obvious that Intel is focused on system performance like never before. This has led them to tackle some serious barriers to modern system performance.

Breaking Down DRAM Barriers

The hunger for more memory is seemingly unending, as science, engineering, and data analytics applications continue to get more compute and data intensive. Larger systems and larger memories translate into an ability for scientists, engineers, and analysts to do better work.

The problems with DRAM are easy to enumerate: DRAM memory is expensive, has failed to keep up with processor performance improvements, and is suffering from a decade long slowing in DRAM scaling. DRAM has been the obvious choice, but appears to be changing.

Intel has made a significant advancement toward reducing this barrier to system performance. Intel Optane™ DC Persistent Memory is an innovative memory technology that delivers affordable large capacity, performance, and persistence (non-volatility). Affordability is not just a result of a lower cash outlay for the memory, it is also a byproduct of lower power consumption. This allows for system deployments that would not be possible if the same machine simply used that much more DRAM.

Intel has sights on far more than just processor performance.
Intel is focused on system performance like never before.

Most 2nd Generation Intel Xeon Scalable Processors support Intel’s persistent memory as an interesting ingredient for superior system performance especially for anything with Big Data, including analytics. Intel Optane DC Persistent Memory is Intel’s bid to revolutionize the memory-storage hierarchy. By affordably increasing memory size, it enables massive data sets to be stored closer to the CPU for faster time to insight, enables larger working sets to deliver higher resolution simulations, enhances performance for latency-sensitive workloads, and allows for more frequent local check pointing capability.

System Performance includes Addressing how HPC, AI, and Big Data are intertwined

Systems built with the 2nd Gen Intel Xeon Scalable processor not only offer performance for a broad range of HPC workloads, they also offer compelling solutions for integrating HPC, data analytics and AI in a single system.

In addition to alternatives to DRAM, 2nd Generation Intel Xeon Scalable Processors include specific acceleration support for Deep Learning in the form of Intel DL Boost.2

Recent research has shown than AI-based models may be able to significantly boost simulation performance.3 Deep neural networks (DNN) with medical images place high demands on memory capacity and performance, have been shown to benefit when processors are used to power the DNN.4 In both examples, vector processing (AVX-512, including Intel DL Boost) and memory subsystem (capacity, bandwidth) are critical to the high performance of HPC and AI workloads.

Intel’s Performance-Barrier Busting Attitude is Unleashed

Intel’s 2nd Gen Intel Xeon Scalable processors may represent the best solution to address the needs of HPC, Big Data, and AI workloads, in one package based on the 31% performance leadership.

Intel has grown its ability to affect system performance with its broad portfolio of platform support that includes Intel Optane DC Persistent Memory, Intel Optane SSDs, Intel Interconnect products, Intel FPGA solutions, Software Defined Visualization (SDVis), Intel Parallel Studio XE 2019 software developer toolkit, eco-system support and optimizations. These all exist today — and most combine beautifully with Intel’s 2nd Gen Intel Xeon Scalable processors for real-world solutions. Future Intel initiatives should help break down more barriers to system performance — including Intel’s Xe GPU project, and Intel’s oneAPI project for heterogeneous computing. Given the results we already see from systems featuring 2nd Gen Intel Xeon Scalable processors, how much will we benefit when Intel helps busts down more barriers?

Find out more

Benchmark results are showing the benefits of coupling outstanding processing performance with outstanding memory bandwidth, and strong benefits from processor technologies for HPC, high performance analytics, and AI applications, are available to dig into.

Learn how Intel Xeon Platinum 9200 processors will benefit your organization.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

  1. For configuration details, visit http://www.intel.com/2019xeonconfigs/ (Intel Xeon Scalable processors – claim #31). For additional detail visit https://www.intel.com/content/www/us/en/high-performance-computing/performance-for-hpc-platforms.html
  2. Up to 30X AI performance with Intel® Deep Learning Boost (Intel DL Boost) compared to Intel® Xeon® Platinum 8180 processor (July 2017). Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, Centos* 7 Kernel 3.10.0-957.5.1.el7. x86_64, Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd-2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer DummyData: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 cpu @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux kernel* 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY=’granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (https://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time –forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
  3. CERN Project Sees Orders-of-Magnitude Speedup with AI Approach, https://www.hpcwire.com/2018/08/14/cern-incorporates-ai-into-physics-based-simulations/
  4. Using Deep Neural Network Acceleration for Image Analysis in Drug Discovery, https://newsroom.intel.com/news/using-deep-neural-network-acceleration-image-analysis-drug-discovery

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available security updates. No product or component can be absolutely secure.

Refer to https://software.intel.com/en-us/articles/optimization-notice/ for more information regarding performance and optimization choices in Intel software products.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. © Intel Corporation.

Comments

  1. I’m not sure of what you are comparing here, what are the AMD memory configurations etc.

  2. Michael Mayer says:

    31% faster but almost twice as much power consumption when comparing TDP…. (400 vs 225 Watts).

Resource Links: