In this video, Samplify CEO Alan Evans presents: APAX: Lowering the Cost of Big Science, Big Data, and Cloud Computing.
Multi-core CPUs are hitting the memory wall,” said Al Wegener, CTO and founder of Samplify. “With each new process node, the number of processor cores on a die can double with Moore’s Law, but the throughput of memory, I/O, and storage fails to keep up with this growth. Hence, the performance of multi-core applications is increasingly memory, I/O, and storage bound. APAX is the only solution that accelerates the throughput DDRx, SAS/SATA, SSD, PCIe, Ethernet, and Infiniband, by up to six times.”
Samplify will demonstrate the APAX profiler and hardware IP at the SC12 conference in booth #4151.
Over at TechWeek Europe, Max Smolaks writes that Dell and Intel have announced plans to open two Product Centers of Competence at the University of Cambridge to serve as testbeds for Xeon and Xeon Phi processors. The aim is to prepare the scientific community for the launch of the first generation of the Xeon Phi family of products, so this coprocessor can be used immediately as a production tool.
This initiative is expected to last for at least two years, and help improve combined threading and vectorisation – some of the biggest challenges in parallel applications. Dell says that while relatively few applications today are highly parallel, in the future they could address a wide range of important issues “ranging from climate change simulations and genetic analysis, to investment portfolio risk management and the search for new sources of energy.”
In this podcast, the Radio Free HPC team talk about Intel’s introduction of their new Xeon Phi co-processor at ISC12, and what customers will need to do to take advantage of it. Henry Newman says something profound. We touch on the merits of an all pastry diet and the state of programming today. Rich throws Dan/Henry a curveball. Dan introduces a new sponsor – Glade Data Center Edition, which promises to give data centers an aroma makeover. And finally, we find out that Henry likes black bread.
Over at BSN, Theo Valich writes that Jay Boisseau’s presentation at the recent Intel Developer Forum sheds light on TACC’s plans to deploy the Stampede supercomputer powered by Xeon Phi co-processors.
The system originally targeted 10 PFLOPS, but it seems they might miss the mark by a few dozen/hundred TFLOPS. According to information given, the Stampede deploys 2 PFLOPS of compute power through Sandy Bridge-EP based Xeon CPUs and no less than 7 PFLOPS using Xeon Phi “coprocessors”. Even though TACC did not disclose how many thousands of Dell servers are being deloyed, we know that the supercomputer has 272TB of DDR3 memory and 14PB of total storage. The Xeon processor is E5-2680 (Sandy Bridge-EP), while TACC is using special versions of pre-production Xeon Phi processors in order to make the launch.
In this video, Veljko Milutinovic and Oliver Pell of Maxeler present: DataFlow Computing for Exascale HPC. Recorded a the HPC Advisory Council Spain Workshop 2012.
Data movement and energy are becoming the biggest constraints in HPC. DataFlow computers have been shown to provide order of magnitude improvements in space and power consumption by focusing first on optimizing data movement in a computer system, then utilizing massive parallelism between thousands of tiny dataflow cores. In this talk, we will introduce dataflow computing and outline the advantages of the approach as well as what kinds of operations map most effectively to the dataflow paradigm. We will discuss several case studies of scientific applications running on different heterogeneous control-flow / dataflow machines, including clusters of heterogeneous compute nodes where each node has control flow CPUs and dataflow engines, and loosely-coupled systems where control-flow nodes utilize dataflow engines as network resources over InfiniBand.
In this video, PC Perspectives explains how Heterogeneous Systems Architecture works. With members including AMD, ARM, and Texas Instruments, the HSA Foundation was founded in June 2012 to enable the industry specification, advancement, and promotion of the architecture bring HSA-enabled platforms and software solutions to the market, from mobile and embedded all the way up to HPC and Cloud Computing.
Joel Hruska over at ExtremeTech takes a deep-dive on Intel’s pending Xeon Phi co-processor based on the MIC architecture. While Intel hasn’t revealed all the details quite yet, Hruska does a good job of setting the stage for the Intel’s battle royale coming up this Fall with Nvidia Kepler.
“The coming war between Intel and Nvidia for the supercomputing market will have a real impact on consumer products, even if it takes several years for the research to trickle down. The HPC industry is struggling to deal with problems of power efficiency, interconnect scaling, storage speed, processor utilization, and communication latency. The mobile phone and tablet markets, meanwhile, are fighting with the very same problems, with the added headache of battery life thrown in. Advances at the top of the market will increasingly shape the bottom (and vice versa). The battle to be the company whose hardware powers both spheres of influence is about to kick off in earnest.
In this video, Jim Ang from Sandia National Laboratories describes Arthur, an experimental Intel MIC system based on Appro’s Xtreme-X supercomputer architecture.
The Xtreme-X was initially deployed at Sandia in November 2011, and the supercomputer boasts 42 nodes based on the Mellanox QDR IB interconnect platform and configured in a multi-rack scale. The project will include three phases, with step one involving the Intel Xeon® 5600 processor and Knights Ferry software to establish a baseline system. Step two is to upgrade the system to Intel® Xeon® processor E5 Family, and the final step will transition the project to Intel® Knights Corner co-processors sometime this year.
In this video, James Reinders from Intel describes the company’s pending Xeon Phi co-processors and how they provide programmers with easy access to parallelism while preserving compatibility.
Last November, we demonstrated our first silicon of the Intel Xeon Phi coprocessor, code named “Knights Corner”. It produced an astounding teraflop of performance in a processor the size of your thumb, setting the industry on notice of the potential of many core architectures and providing a clear path of how we’ll get to the Petascale and Exascale era. This is the same amount of performance as the number 1 supercomputer on the TOP500 list in 1997, dubbed ASCI Red. ASCI Red used thousands of processors and filled a room with cabinets to produce the same amount of performance. Knights Corner quickly got the nickname of “Supercomputer on a Chip”.
vSMP Foundation for the Intel Xeon Phi coprocessor based platform will virtualize both the Intel Xeon processor and the Intel Xeon Phi coprocessor based platform cores as well as host and Intel Xeon Phi coprocessor memory to act like a single Symmetric Multiprocessing (SMP) system. Users will have easy access to all of the computing and memory resources, greatly simplifying their development and production environments,” stated Shai Fultheim, founder and CEO of ScaleMP.
In this video, James Reinders discusses Allinea Software’s DDT debugger support for the newly announced Intel Xeon Phi (formerly known as Intel MIC) accelerator products. Recorded at ISC’12 in Hamburg.
We are pleased that Allinea Software has developed support for the Intel MIC Architecture and will be a valued resource for users of supercomputer systems using this exciting architecture” said James Reinders, Director of Parallel Programming Evangelism at Intel. “The first products from the new Intel® Xeon® Phi™ product family, based on Intel MIC Architecture, bring a new capability to supercomputer performance and power efficiency without sacrificing programmability by staying true to the flexibility of today’s Intel Xeon processor based systems. Allinea Software highlights this by extending their popular solutions in a manner that is completely familiar to users on today’s Intel Xeon processor based systems.”
As at ISC’11 last year (and SC11), I think there will be a strong fight for attention in the key area of manycore/GPU devices – and a matching search for evidence of real progress. So far the loudest voice has been NVidia and CUDA, especially following NVidia’s successful GTC event recently. However, interest in Intel’s MIC (Knights Corner) is strong and growing – MIC has often been a big discussion topic in workshops, conferences and meetings over the last year. As the MIC product launch gets closer, people will be making obvious comparisons with NVidia’s Kepler announced at the GTC.
The full name for our software stack is the “Intel® Many Integrated Core (MIC) Platform Software Stack.” Users often call it MPSS for short. It is dependent on the 2.6.34 Linux kernel, and it has been tested to work with specific versions of 64-bit Red Hat Enterprise 6.0, 6.1, and 6.2, as well as SuSE Linux Enterprise Server (SLES) 11 SP1. The readme.txt file has more information on how to build and install the stack.
The HPC Guru points us to this paper from Virginia Tech and Argonne entitled: MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems.
Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement frameworks, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance beneﬁts by integrating support for auxiliary memory spaces into MPI. MPI-ACC’s runtime system enables several key optimizations, including pipelining of data transfers and balancing of communication based on accelerator and node architecture. We demonstrate the extensible design of MPI-ACC by using the popular CUDA and OpenCL accelerator programming interfaces. We examine the impact of MPI-ACC on communication performance and evaluate application-level beneﬁts on a large-scale epidemiology simulation.
In this slidecast, Convey CEO Bruce Toal presents: HC-2: Next Generation Hybrid Core.
Convey’s innovative hybrid-core architecture pairs classic Intel processors with a coprocessor comprised of FPGAs. Particular algorithms—DNA sequence assembly, for example—are optimized and translated into code that’s loadable onto the FPGAs at runtime, greatly accelerating performance-critical applications. The new Convey HC-2 systems increase application performance 2-3 times over previous generations of Convey servers and orders of magnitude over commodity servers.