New Class of Intel Xeon Scalable Processors Break Through Performance Bottlenecks

cascade lake

This guest article from Intel explores how its Cascade Lake advanced performance processors have the potential to accelerate AI, big data and HPC workloads. 

cascade lake

Today’s advanced applications require faster and increasingly powerful hardware and storage technologies to make sense of the data deluge. (Photo: Intel)

Estimates suggest 90% of the world’s total data volume was created in the last two years.1 However, less than one percent of that vast data lake is analyzed2 for accelerated business results, better customer experiences, advanced scientific research, or other societal benefits. Unlocking the bigger-picture meaning from raw data volumes is no easy task. Unfortunately, that means that many important insights remain hidden within the untapped data which quietly floods data centers around the globe each day.

Today’s advanced applications require faster and increasingly powerful hardware and storage technologies to make sense of the data deluge. Intel seeks to address this critical trend with a new class of future Intel® Xeon® Scalable processors (code-named “Cascade Lake”). These advanced performance CPUs will continue Intel’s 20 years of Intel Xeon processor innovation. Designed with the demands of HPC and technical computing, big data, plus artificial intelligence (AI) scenarios in mind, the upcoming Cascade Lake advanced performance processors will extend the current Intel Xeon Scalable processor portfolio to address even more demanding workloads.

As today’s applications and their working data sets grow in both complexity and size, aging hardware struggles to keep up. Modern processors must excel in workload-optimized functionality, faster data movement, and more efficient data storage. Cascade Lake advanced performance CPUs will offer businesses, government agencies, and scientific institutions helpful technology to facilitate their IT transformation and derive deeper, and more useful wisdom from their data assets – and do it more quickly than ever before.

Several Intel innovations make Cascade Lake’s advanced performance even more powerful than the current-generation Intel Xeon Scalable processors, enabling robust compute capability and increased memory bandwidth for demanding workloads. Each processor will offer 48 cores and 12 DDR4 memory channels — more channels than any other CPU — delivering unprecedented DRR memory bandwidth. As such, Cascade Lake advanced performance offers twice the memory channels of today’s Intel Xeon Scalable processor (Skylake) and 50 percent more memory channels than EPYC 7601 processors, resulting in a 1.3X STREAM Triad performance advantage.4

The added headroom enables greater performance for memory bandwidth-sensitive workloads and technical computing scenarios like computational fluid dynamics or weather modeling, plus embedded acceleration for AI. To extend the outstanding deep learning inference capabilities of Intel Xeon Scalable processors, this new Cascade Lake processor will also include Intel® DL Boost, enabling a 17X5 inference speed-up over the “Skylake” generation when it launched in July 2017.

Based on preliminary performance testing at Intel using LINPACKand STREAM Triad4, Intel’s Cascade Lake advanced performance processors demonstrate their prowess versus competitive alternatives. For high-performance LINPACK, Intel shows the 2-socket Cascade Lake advanced performance implementation can deliver 3.4X more compute than the highest-performance 2S AMD EPYC 7601.3

The added headroom enables greater performance for memory bandwidth-sensitive workloads and technical computing scenarios like computational fluid dynamics or weather modeling, plus embedded acceleration for AI.

Talk with Intel or your system manufacturer to find out how the upcoming Cascade Lake advanced performance processors can accelerate your organization’s most demanding workloads like HPC and AI.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see here. 

Footnotes

  1. “How Much Data Do We Create Every Day? The Mind-Blowing Stats That Everyone Should Read,” By Forbes.
  2.  “Only 5% of Data is Currently Analyzed,” by DATAVERSITY.
  3. LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
  4. Stream Triad: 1-node, 2-socket AMD EPYC 7601, tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
  5. DL Inference: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY=’granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time –forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from ResNet-50, and ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l.“ Tested by Intel as of July 11th, 2017 -. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/7/2018.