Entries filed under “Compute”

News related to the processors used in HPC gear.

DSSD is Andy Bechtolsheim’s Secret Chip Startup for Big Data

Over at GigaOm, GigaStacey writes that the solution for better and faster storage may lie in DSSD, a stealthy chip startup backed by Andy Bechtolsheim. Founded in 2010 by Sun Alums Jeff Bonwick and Bill Moore, DSSD is trying to build a chip that would improve the performance and reliability of flash memory for high performance computing, newer data analytics, and networking.

My sources tell me the startup is building a new type of chip — they said it’s really a module, not a chip — that combines a small amount of processing power with a lot of densely-packed memory. The module runs a pared-down version of Linux designed for storing information on flash memory, and is aimed at big data and other workloads where reading and writing information to disk bogs down the application. This fits with the expertise of the team, but this is a problem that others are trying to solve as well with faster and cheaper SSDs and targeted software to to optimize the flow of bits to a database. But the proposal here appears to be about designing an operating system that takes advantage of the difference in Flash memory when compared to hard drives to boost I/O.

Read the Full Story.

Also posted in Computing Research, HPC, HPC Hardware, inside Startups, inside-BigData, Storage | Leave a comment

SGI’s Eng Lim Goh Presents: From Extreme Scale Computing to Big Data

In this video from the 2013 National HPCC Conference, Dr. Eng Lim Goh from SGI presents: From Extreme Scale Computing to Big Data.

The internet, sensors and high performance computing are some of the top Big Data producers. Recently, there has been increased focus on extracting more value out of these generated data. Analysis of Big Data sets may be simplified as “looking for needle in a haystack” on one end of a spectrum to “looking for relationships between hay in a stack” on the other. We will discuss the architectural platforms and tools suitable for different parts of this spectrum.”


Also posted in Events, HPC, HPC Hardware, inside-BigData, National HPCC Conference, Storage, Video | Leave a comment

Total Goes Petascale with SGI ICE X Supercomputer

This week SGI announced that Total has selected the SGI ICE X technology for its new 2.3 Petaflop Pangea supercomputer. In what is described as the largest commercial HPC system in the world, Pangea will give Total’s in-house engineers and geologists an extremely powerful tool to enable the application of analytical and numerical models that support the development of three dimensional visualizations of underground geological formations, key to identifying potential deposits of oil and gas and to determining optimal extraction methods.

Total is committed to leveraging technological innovation and high performance computing to provide the best response to growing global energy demand,” said Philippe Malzac, CIO Exploration and Production for Total. “The efficiency of the SGI ICE X system, which represents high computational power using a minimal amount of energy, gives Total the smallest footprint and lowest TCO possible. This was a key factor in our selection of SGI ICE X for the Pangea system.”

To maximize energy efficiency, Total selected an innovative water-cooled SGI ICE X solution based on its M-Cell design. M-Cells utilize closed-loop airflow and warm-water cooling to create embedded hot-aisle containment, thereby lowering overall cooling requirements and significantly reducing overall energy consumption as compared to traditional HPC designs. The 2.3 PFlop system is based on the Intel Xeon E5-2670 processor that consists of 110,592 cores and contains 442 terabytes of memory. The data management solution for seven petabytes of storage includes SGI InfiniteStorage 17000 disk arrays, SGI DMF tiered storage virtualization, and a Lustre file system integrated by SGI professional services.

Read the Full Story.

Also posted in HPC, HPC Hardware, New Installations | Leave a comment

Video: Overview of AMD in HPC

In this video from the HPC Advisory Council Switzerland Conference, Roberto Dognini presents an Overview of AMD in HPC.

Download the Slides (PDF).

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Hardware, Video | Leave a comment

Blue Waters Ready to Handle Floods of Data

Big Data requires big computing, and the University of Illinois at Urbana-Champaign is doing its part with the launch of Blue Waters, one of the world’s fastest supercomputers.

U of I held an open house a couple of weeks ago, inviting one and all to visit its National Petascale Computing Facility and kick the tires on the $200 million machine built by Cray and funded by the National Science Foundation.

This is a petaflop machine designed to handle the challenging Big Data requirements associated with a wide range of problems – everything from unraveling complex biological systems to simulating the evolution of the cosmos.

This is where you go to get answers to questions about how the world works,’ says Bill Gropp, a computer science professor and one of four U of I researchers who oversaw the five-year development of the machine,” according to a story in Crain’s Chicago Business. The article goes on to say, “Blue Waters will keep the university in the lead on large-scale computing as researchers from around the country apply to the National Science Foundation to use the machine to crunch data for medical research, astrophysics, aerodynamics, weather forecasting, national security and other uses.”

This is not your everyday supercomputer. The Blue Waters system is a Cray XE/XK hybrid machine made up of AMD 6276 “interlagos” processors with a nominal clock speed of at least 2.3 GHz) and NIVIDIA GK110 Kepler accelerators, all connected by the Cray Gemini torus interconnect.

Blue Waters is capable of a sustained speed of over one petaflop, allowing it to perform more than one quadrillion calculations per second. The water-cooled system is housed in 276 black cabinets topped by silvery coolant pipes.

In addition to being really fast, Blue Waters has more than enough memory to handle Big Data requirements – 1.5 petabytes of total system memory and 300 petabytes of long-term storage.

In the Crain’s article, Gropp is quoted as saying, “We want people to ask, ‘What could you do if you could put massive amounts of data on a system and access it in microseconds?’”

The short answer is, “More than you can ever imagine.”

Read the Full Story.

Also posted in HPC, HPC Hardware, Storage, Video | Leave a comment

GTC 2013: ARM + GPU = GPU’riffic, says Barcelona SCC

In this special guest feature, Dan Olds from Gabriel Consulting writes that the Barcelona Supercomputer Center is making a big bet on ARM processing for HPC.

Over the last few years, we’ve seen a steadily growing buzz surrounding the use of ARM chips in PCs, servers, and supercomputers. Here at GTC 2013, that buzz is even more pronounced due to NVIDIA’s upcoming Project Denver, and advances in their GPU technology that result in even less dependency on having a fast and powerful (read: Xeon) processor feeding the GPU number-crunching beasts. Our pal Rik Myslewski penned a great article on GTC 2013 ARM chatter here.

While most everyone has been debating and speculating about what it would be like to combine ARM processors and GPU accelerators, one organization has put together some hardware in order to separate the theoretical from the real. The Barcelona Supercomputer Center (from the Barcelona in Spain, not the other one) is building clusters to explore the potential advantages that might arise from combining low power ARM processors with fast number-crunching GPUs.

Their first attempt, the Tibadabo, was a proof of concept to determine whether it’s possible to build an all-ARM-based cluster. Could they really put together a cluster based on cell phone processors? And, if they could build it, could they find or adapt enough software for it to do useful work?

They were able to construct a two-rack cluster containing 32 blades, 256 nodes, and a total of 512 Tegra 2 ARM cores. They were able to port 11 scientific apps over to ARM with little difficulty, although they did need to fiddle around with the memory hierarchy to optimize some of the apps.

The performance wasn’t all that great. The total system turned out 512 GFLOPs while consuming 3.4 KW, yielding .015 GGLOPs/watt. For context, the best systems on the most recent Green500 list come in around 2.4 or 2.5 GFLOPs/watt, while the systems at the end of the list are rated at .033 GFLOPs/watt.

So they went back to the drawing board and, using NVIDIA’s CARMA development box, clustered 16 of them together as a learning experience they called Pedraforca v1. This system did much better than the ARM-only Tibadabo on energy efficiency, yielding .78 GFLOPs/watts on DGEMM and 5.04 in SGEMM (matrix multiply double and single precision), so they were making progress.

Limitations in the platform (PCIe max of 400 MB/s plus inability to overlap computation and data transfers) meant it couldn’t be scaled up very well. However, it did lead them to a new breakthrough in their thinking for their next system, which they’ve dubbed Pedraforca V2.

They’ve decided the key to building a highly efficient system isn’t to build an accelerated cluster but to build a cluster of accelerators. While there isn’t much difference in the words, there’s a world of difference between the meanings. With Pedraforca v2, they will be de-coupling the CPUs from the GPUs, meaning that the ratio of CPU-GPU can be changed to fit the workloads. They will also be using direct GPU-GPU data transfers via Mellanox’s ConnectX-3 Infiniband interconnects.

This will take a huge amount of latency out of the system and, accordingly, reduce the amount of work the CPU needs to do to orchestrate GPU communications. The prototype system will have 64 nodes which will utilize a quad-core Tegra 3 CPU that will slide into a 4x PCIe slot on a Mini-ITX carrier. In this configuration, the CPU will only be managing boot and MPI communications, plus minimal traffic cop duty for the GPUs. The point is that you don’t need a hugely fast and powerful processor to fulfill these requirements.

However, Pedraforca v2 will have some processing power in the form of Kepler-based NVIDIA K20 GPUs that can deliver 1,170 GFLOP/s through a PCIe Gen3 slot. The GPUs will be able to communicate with each other at 40 Gb/s via the aforementioned Mellanox-fueled Infiniband interconnect.

Both presenters pointed out that this isn’t a general purpose HPC system – it is intended as a host for apps that are GPU-optimized. While they didn’t discuss any FLOPs/watt estimates or performance predictions, it’s safe to say that this system should be an eye opener when it comes to energy efficiency and even cost per FLOP. It’s definitely a project worth watching.

Also posted in GPUs, Green HPC, HPC, HPC Hardware | Leave a comment

Video: Dell’s Modular HPC – Exascale Block by Block

In this video from the HPC Advisory Council Switzerland Workshop, Kris Buggenhout from Dell presents: Modular HPC, Exascale Block by Block.

Download the Slides (PDF).

Also posted in Events, Exascale, HPC, HPC Advisory Council Workshop, HPC Hardware, Storage, Video | Leave a comment

Record Simulations Conducted on Lawrence Livermore Supercomputer

Over at Lawrence Livermore, Breanna Bishop writes that researchers at LLNL have performed record simulations using all 1,572,864 cores of the Sequoia supercomputer. As the first supercomputer to exceed one million computational cores, Sequoia is also is No. 2 on the TOP500 with 16.3 petaflops of performance.

SIRIS simulation on Sequoia of the interaction of a fast-ignition-scale laser with a dense DT plasma.

The simulations were performed by Frederico Fiuza, a physicist and Lawrence Fellow at LLNL. Designed to study the interaction of ultra-powerful lasers with dense plasmas in a proposed method to produce fusion energy, the project is part of the U.S. Department of Energy’s Office of Fusion Energy Science Program.

Using the OSIRIS code, Fiuza demonstrated excellent scaling in parallel performance to the full 1.6 million cores of Sequoia. By increasing the number of cores for a relatively small problem of fixed size, what computer scientists call “strong scaling,” OSIRIS obtained 75 percent efficiency on the full machine. But when the total problem size was increased, what is called “weak scaling,” a 97 percent efficiency was achieved.

This means that a simulation that would take an entire year to perform on a medium-size cluster of 4,000 cores can be performed in a single day. Alternatively, problems 400 times greater in size can be simulated in the same amount of time,” Fiuza said. “The combination of this unique supercomputer and this highly efficient and scalable code is allowing for transformative research.”

Read the Full Story.

Also posted in Computing Research, HPC, HPC Hardware | Leave a comment

Video: CPU Alternatives for Future High-performance Systems

In this video from the HPC Advisory Council Switzerland Conference, Nikola Puzovic from the Barcelona Supercomputing Center presents: CPU Alternatives for Future High-performance Systems.

Energy efficiency is already a primary concern for the design of any computer system and it is unanimously recognized that future Exascale systems will be strongly constrained by their power consumption. This is why the Mont-Blanc project, which was launched on 1st October 2011, has set itself the following objective: to design a new type of computer architecture capable of setting future global High Performance Computing (HPC) standards that will deliver Exascale performance while using 15 to 30 time less energy. This project is coordinated by the Barcelona Supercomputing Center (BSC) and has a budget of over 14 millions, including over 8 million Euros funded by the European Commission.

Download the slides (PDF).

Also posted in Events, Green HPC, HPC, HPC Advisory Council Workshop, HPC Hardware, Video | Leave a comment

Video: Who can Beat x86? How to Reduce your Power Footprint and More

In this video from the HPC Advisory Council Switzerland Conference, Piero Altoè from E4 Computer Engineering presents: Who can Beat x86? How to reduce your power footprint and more. The company’s new ARKA blades offer extreme power efficiency with ARM-based processors. Download the slides (PDF)

Also posted in Events, Green HPC, HPC, HPC Advisory Council Workshop, HPC Hardware | Leave a comment

Video: Experiences from the Deployment of TACC’s Stampede System

In this video from the HPC Advisory Council Switzerland Conference, Karl Schulz from the Texas Advanced Computing Center presents: Experiences from the Deployment of TACC’s Stampede System.

Stampede is one of the largest computing systems in the world for open science research. Stampede system components are connected via a fat-tree, FDR InfiniBand interconnect. One hundred and sixty compute racks house compute nodes with dual, eight-core sockets, and feature the new Intel Xeon Phi coprocessors. Additional racks house login, I/O, big-memory, and general hardware management nodes. Each compute node is provisioned with local storage. A high-speed Lustre file system is backed by 76 I/O servers.

Download the slides (PDF).

Also posted in Co-processors, Events, HPC, HPC Advisory Council Workshop, HPC Hardware, Network, New Installations, Video | Leave a comment

Benchmarking Intel Xeon Phi vs. Sandy Bridge

Intel has been careful to label the Xeon Phi as a coprocessor, something that always pairs with a Xeon CPU. But how does their performance compare on real applications? Over at the Xcelerit Blog, Paul Sutton benchmarks both devices using an optimized parallel version of the Monte-Carlo LIBOR swaption portfolio pricer.

It is executed once on the host CPUs (the Sandy Bridge processors), and again on the Xeon Phi co-processor in offload mode. The execution time of the full application is measured, including data transfers, random number generation, and reduction. All these steps are running on the target processor.

As we can see, from about 100K paths onwards, the Intel Xeon Phi becomes faster than the Sandy Bridge processors, reaching nearly 3x at 1M paths. With lower numbers of paths, the Sandy Bridge outperforms the Phi. This can be explained by the added data transfers and the comparably low level of parallelism for a low number of paths (considering both vectorization and multi-threading). The setup time for the random number generator also becomes more dominant on the Xeon Phi when there is relatively little computation performed.

Read the Full Story.


Also posted in Co-processors, HPC, HPC Hardware | Leave a comment

Martin Thompson on the CPU Cache Flushing Fallacy

Over at the Mechanical Symphony blog, Martin Thompson writes that the “CPU Cache Flushing Fallacy” can cost you huge performance hits when coding your algorithms.

Next time you are developing an important algorithm, try pondering that a cache-miss is a lost opportunity to have executed ~500 CPU instructions! This is for a single-socket system, on a multi-socket system you can effectively double the lost opportunity as memory requests cross socket interconnects.

For all you programmers out there with a need for speed, I’d say this one is a “must read.” Check out the Full Story.

Also posted in HPC, HPC Hardware, HPC Software | Leave a comment

Convey Machine Chosen for Genome Analysis

The Genome Analysis Center (TGAC), one of seven institutes that receives funding from the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), has deployed two Convey HC-1ex hybrid-core systems for advanced genomics research.

TGAC, based in the UK, is an aggressive adopter of advanced sequencing and IT technology. The two Convey HC-1ex systems are the latest addition to TGAC’s powerful computing infrastructure. By installing hybrid-core Convey HC-1ex systems, TGAC expanded its cluster and ccNUMA-based HPC environment to include leading-edge heterogeneous computing capabilities.

We need to analyse data quickly and precisely, which takes time on clusters,’ explained Mario Caccamo, deputy director of TGAC. “We offloaded some of our sequence alignment demand to the Convey hybrid-core systems, because they can handle the alignment algorithms much more efficiently. Using the Convey systems, we are seeing up to 15 times acceleration on our computationally intense BWA runs.”

TGAC was part of an international team that recently demonstrated next-generation sequencing could be used effectively to fine map genes in polyploidy wheat. TGAC will leverage Convey’s architecture to accelerate computationally challenging jobs, such as resequencing alignment for wheat and other polyploidy species.

The initial performance jump is a major improvement,” continued Caccamo. “We expect to achieve even better performance as we gain experience using the Convey platform.”

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Also posted in Co-processors, HPC, HPC Hardware | Leave a comment

Henry Newman on Choosing CPUs for Storage Servers

Over at CIO Magazine, Henry Newman from Instrumental writes about the tradeoffs to consider when selecting the right CPU technology for your storage servers.

For at least this year, the two server CPU choices remain Intel and AMD. ARM might solve some of the computational parts of some of the problems, but in 2013, ARM won’t have enough I/O bandwidth with 10 Gigabit Ethernet ports and storage to make it a viable alternative. This might change for 2014, but it’s too soon to predict as development of PCIe buses with enough performance capability is complex.

Read the Full Story.

Also posted in HPC, HPC Hardware, Storage | Leave a comment

Advertisement

Nvidia Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap