Search Results for: “comparison”

Bitcoin Network Aggregates More Cycles than the TOP500

Search Results for: comparison

Over at the The Genesis Block, “Phillip Archer” writes that the bitcoin network is now eight times more powerful than the TOP500 supercomputers combined.

While aggregated compute cycles on a network is a far cry from a supercomputer, the comparison does show the remarkable growth of the bitcoin network.

Interestingly, the estimate may still be useful for estimating how well other supercomputers and distributed networking projects would be able to mine bitcoins. Their speed is measured in FLOPS, but they also have the capability of performing the integer operations used in hashing. What would happen if the top 10 supercomputers all switched to bitcoin mining? How much would that affect the network? Lets reverse the equation, and say that they would receive 1 hash for every 12.7k FLOP. The fastest computer, Sequoia, would measure at about 1.6% of the bitcoin network. Their combined speed is 48 petaFLOPS, roughly equivalent to 5% of the bitcoin network. In fact, the top 500 supercomputers have a combined speed of 12% of the bitcoin network.

According to the Wikipedia, Bitcoin is accepted in trade by merchants and individuals in many parts of the world. The processing of bitcoin transactions is secured by servers called Bitcoin miners, which communicate over an internet-based network and confirm transactions by adding them to a ledger which is updated and archived periodically. In addition to archiving transactions each new ledger update creates some newly-minted bitcoins.

Read the Full Story.


Read the entire post …

Posted in HPC, TOP500 | Leave a comment

Green Graph 500 Launches to Boost Energy Efficient Big Data Computing

Search Results for: comparison

In this special guest feature, Torsten Hoefler from ETH Zurich writes that the new Green Graph500 aims to boost energy-efficient Big Data Computing.

“Big Data” can be analyzed in various ways. The most successful and prevalent programming model, MapReduce, convinces by its flexibility toadapt to hardware performance variations and faults. However, even though MapReduce covers a huge majority of use-cases, it has its limits for graph computations. Complex graph algorithms become more important as our analysis capabilities grow. For example, problems such as finding hubs in social network graphs are routinely answered today. The underlying algorithm, betweenness centrality, utilizes a graph traversal similar to breadth first search or shortest path search. Systems such as Google’s Pregal, Apache’s Giraph, the (Parallel) Boost Graph Library, and Stanford’s GPS are just some examples for emerging frameworks to handle large-scale graph computations. In order to efficiently compare architectures and possibly programming frameworks, the Graph 500 benchmark strives to establish a database for performance of a standardized breadth first search on various platforms.

As energy is becoming a bigger concern than hardware purchasing costs in large-scale data centers and supercomputing centers, it becomes mandatory to not only consider the performance of such computations but also their exact energy consumption. In fact, if the current cost trends continue, then energy consumption will soon be more important than absolute performance. Such discussions are highly relevant for operators of large data centers such as Google, Amazon, and Yahoo, as well as large supercomputing centers operated by the DOE (e.g., LLNL, Sandia,LANL, ORNL) and the NSF (e.g., NCSA, SDSC, PSC). We are thus looking forward to interesting future developments targeting exascale as well as Big Data architectures and programming frameworks.

We introduce the Green Graph 500 list which fulfills a variety of purposes. First and foremost it is to establish the practice to compete not only for the highest performance but also for the highest energy efficiency, directly benefiting society. It is also set out to collect historical data about developments that may allow us to predict future trends very similar to what the top 500 list has achieved in the past(who doesn’t like to put up a top 500 slide to project out FLOP rate for the next 10 years?). The list will also allow us to compare the energy efficiency of a specific computer for certain tasks, e.g.,dense linear algebra (a problem mainly limited by memory size and CPU peak floating point performance) versus graph search (a problem mainly limited by memory access rates and global system bandwidth). Those two metrics together may serve as a measure to generate more efficient balanced systems as well as special-purpose systems for one of those tasks.

Finally, the new Green Graph 500 list is not meant to compete with any of the existing lists. It is indeed complementary, filling an important gap in the field. In fact, the rules are designed to be similar to the established Green 500 rules (similar, not identical, for example with regards to the network) so that comparisons can easily be made in the future. It also directly integrates with the Graph 500 list and submission system to guarantee one-to-one comparisons (a submission record may be in the Green Graph 500 as well as the Graph 500 even though the lists are ranked by different indices).

The Green Graph 500 list is soliciting submissions from everyone through the Graph 500 submission system. To submit to the list, simply start a normal Graph 500submission and select “Submit to Green Graph 500″ or “Submit to both lists”. The only additional data you need for a Green Graph 500submission is the actual power draw of your system during the benchmark.

Another small difference between Graph 500 and it’s Green peer is the measurement methodology. Since most power meters are not accurate enough to measure the rather short actual BFS run (not including the post-check etc.), we offer a slightly modified version of the reference benchmark which allows to run the BFS in a tight loop long enough for a low-time resolution energy meter to measure the exact energy consumption. This benchmark will also report a Graph 500 number valid for submission. For runs with a custom implementation, this would need to be ensured manually (4-5 lines of C Code suffice for this). The submission opens together with the official Graph 500 submission.

As a sneak peek, we prepared a sample list from March 2013′s energy submissions (which may not have followed all the official rules, thus, the list is not official).

The Green Graph 500 list is maintained by Torsten Hoefler from ETH Zurich in collaboration with the Graph 500 executive committee. For questions or comments please contact [email protected]

Read the entire post …

Posted in Green HPC, HPC, inside-BigData | Leave a comment

Atipa to Build 3.4 Petaflop Super for DOE Environmental Molecular Sciences Lab

Search Results for: comparison

The Department of Energy’s Environmental Molecular Sciences Laboratory has ordered up a 3.4-petaflop supercomputer from Atipa Technologies, the HPC division of Microtech Computers. The new system will replace the Chinook supercomputer which aids energy, environment and basic science missions important to DOE.

The 42-rack machine will boast a total of 195,840 cores, consisting of 23,000 conventional Intel Xeon processors tied to 184,000 gigabytes of memory. The 1,440 compute nodes will also have an undisclosed number of Xeon Phi coprocessing cards alongside the Xeons, allowing the system to parallelize up to 120 extra calculations. A shared parallel filesystem will offer 2.7 petabytes of usable storage, across an FDR Inifiniband network. In total, there will be 128 GB of memory per node. What sets the new supercomputer apart, Atipa said, is the amount of memory devoted to each CPU, allowing the models that scientists run to operate more efficiently. For comparison, the recently completed “Stampede” supercomputer at the University of Texas also relies on just over 184,000 gigabytes of memory, including 204,900 cores split between a number of 8-core Intel Xeon E5-2680 microprocessors.

Read the Full Story.


Read the entire post …

Posted in Business of HPC, Co-processors, Compute, HPC, HPC Hardware | Leave a comment

Adaptive Computing Enhances Moab HPC Suite with Version 7.2 at SC12

Search Results for: comparison

In this video from SC12, Brady Kimball from Adaptive Computing describes enhancements to the Moab Compute Manager 7.2 suite including:

  • Support for Intel Xeon Phi coprocessors
  • Dual Domain Scheduling for Cray systems
  • Streamlined RPM experience
  • Allocation Updates
  • Enhanced Viewpoint GUI for HPC

Read the Full Story.

Read the entire post …

Posted in Co-processors, Compute, Events, HPC, HPC Hardware, HPC Software, SC12, System Management, Video | Leave a comment

Energy Efficiency Focus in the SC12 Technical Program

Search Results for: comparison

Energy Efficiency Focus in the SC12 Technical Program

by Natalie Bates, Co-chair Energy Efficient HPC Working Group (EE HPC WG)

 

Energy efficiency will again be a hot topic at SC12, with at least 38 Technical Program sessions focused on energy efficiency.  A complete list of these sessions organized both chronologically and by topic can be found on the Energy Efficient HPC Working Group website.  SC12, the annual International Conference for High Performance Computing, Networking, Storage and Analysis, will be held Nov. 10-16 in Salt Lake City, Utah. For more information, see the SC12 website.

BROAD SCOPE SESSIONS

The Third Annual Workshop on Energy Efficient High Performance Computing – Redefining System Architecture and Data Centers” promises to be interesting to a broad audience.  Some of the featured speakers include; Peter Kogge, University of Notre Dame who will look at the historical trends of power, energy and supercomputing; John Shalf, Lawrence Berkeley National Laboratory whose talk will focus on the energy requirements for applications; as well as Herbert Huber, Leibniz Supercomputing Center and Steve Hammond, National Renewable Energy Laboratory who will speak about energy efficient data centers.

There are four other technical programs that will cover the topic of energy efficiency at a high level.  Kirk Cameron, Virginia Tech is on the slate to give two talks, both of which have clever and enticing titles with phrases about a “Growing Power Struggle” and “Energy Oddities.”  Prohibitive energy costs motivated Thomas Ludwig, German Climate Computing Center to consider the cost and benefits of “HPC-Based Science in the Exascale Era.”  Finally, there is a “Cool Supercomputing” Birds of Feather (BoF) organized by Pacific Northwest National Laboratory that covers tools and techniques for optimizing energy consumption at all levels.

Setting Trends for Energy Efficiency” is a BoF representing a collaborative effort by the Top500, Green500, the Energy Efficient HPC Working Group and The Green Grid to standardize the power measurement methodology used when running system workloads for architectural comparison, such as High Performance Linpack.  This is one of seven sessions that cover energy efficiency measures and metrics.  The Green500, Top500 and now the Graph500 have their own BoFs and will report power consumption and energy efficiency as well as performance for their Lists.   The High Performance Group  at at the Standard Performance Evaluation Corporation (SPEC) has also organized a BoF that will discuss  a new OpenMP benchmark suite with an optional energy metric that scales to 512 threads.  From the home of the Green500 at Virginia Tech, Balaji Subramaniam will present his doctoral showcase on metrics for energy efficiency.  Finally, an Intel team will present a paper on tuning for the Graph500 Traversal which includes both performance and energy efficiency results.

SESSIONS FOCUSSED ON SYSTEM HARDWARE

Thirteen of the sessions are exploring system hardware energy efficiency.  Of these thirteen, seven of them focus on alternative processors like GPU and ARM that are continuing the trend towards aggregating low-power processors and using accelerators. There are three BoFs that explore alternative processors and all three are organized by Europeans. The Partnership for Advanced Computing in Europe (PRACE) explores a set of prototypes to test and evaluate promising new technologies for future multi- Petaflop/s systems that include GPUs, ARM processors, DSPs and FPGAs.  The Barcelona Supercomputing Center is heading up an ARM-based exascale demonstration and will review their research results and plans at two BoFs; “Energy Efficient HPC” and “Exascale Research- The European Approach.” Besides these BoFs, there is a session as part of Broader Exchange where Calxeda, an ARM-based server provider, will present their products and roadmaps.  NEC is presenting an exhibitor forum on “Hybrid Solutions with a Vector-Architecutre for Efficiency.”  There is also a paper on “Multi-Core DSP” and a poster on modeling “Power-Performance Efficiency” for GPUs.

A new topic for SC this year is a focus on memory technologies, which was presaged by a keynote at the International Supercomputing Conference held in Hamburg, Germany last June when Dr. Byungse So, Samsung Senior Vice President gave a talk on “Advanced Memory Technology – #1 Factor for Energy Efficient HPC”.  Two papers, RAMZzz and Mage, both explore novel memory system designs.  Samsung and Micron, respectively are presenting exhibitor forums on “How Memory and SSDs can Optimize Data Center Operations” and “Hybrid Memory Cube (HMC)”.

Whereas memory is on the uptake, the focus on liquid cooling has waned with only two sessions this year compared to six last year at SC’11.  Eurotech will present an exhibitor forum on “Differences Between Cold and Hot Water Cooling on CPU and Hybrid Supercomputers” and Green Revolution Cooling will present on “100% Server Heat Recapture in Data Centers is Now a Reality.”

DATA CENTER SESSIONS

Kimberly Cupps, Lawrence Livermore National Laboratory will present on “The Sequoia System and Facilities Integration Story”.  It appears that she will be giving the same talk at two different sessions; on Monday during Broader Engagement as well as on Tuesday as an Invited Speaker.  Also, the M+W Group will present an exhibitor forum on “Reducing First Costs and Improving Future Flexibility in the Construction of High Performance Computing Facilities.”

APPLICATION TUNING AND JOB SCHEDULING

There are nine sessions that describe research on tuning applications for energy efficiency and various aspects of energy efficient job scheduling.  Seven of the nine sessions are doctoral showcases, papers or posters.  There is a BoF on “Power and Energy Measurement Modeling”.  In this BoF, members of the research community and industry will present current state-of-the-art and limitations in measuring and modeling power and energy consumption and their effect on HPC application performance. An open discussion about future directions for such work will follow, with the intention of creating a “wish list” of feature requests to HPC vendors.  Another BoF of interest is the SLURM User Group Meeting, which provides an open source job scheduler.  Also, Charles Lively, ORNL will give a talk during Broader Engagement on “Heading Towards Exascale – Techniques to Improve Application Performance and Energy Consumption Using Application-Level Tools”.

Following is a list of the titles for the doctoral showcases:

Following is a list of the titles for the papers:

Following is a list of the titles for the posters:

OTHER SESSIONS

Two other sessions that will cover energy efficiency include an all day workshop on “High Performance Computing, Networking and Analytics for the Power Grid” and a poster on “Pay as You Go in the Cloud: One Watt at a Time.”

Although this is a list of sessions with a specific focus on energy efficiency, many more sessions will include energy efficiency as part of a broader focus.

Read the entire post …

The Missing Amazon Glacier Cost-Estimator

Search Results for: comparison

As reported here, storage pundits have been dubious as to claims by Amazon that their new AWS Glacier cloud archiving service is a tape killer. At a penny per gigabyte per month, the press release had some journalists eating the dog food with speculations that Glacier could make tape silos obsolete.

But wait. Was Amazon leaving out crucial pieces of the comparison puzzle? Now J. Brandt Buckley has posted a Amazon Glacier Cost-Estimator Calculator to help elucidate the relationship between cost, data retention periods, and recovery scenarios.

Read the entire post …

Posted in Cloud HPC, HPC, HPC Hardware, Storage | 1 Comment

Video: Graphics in the Cloud

Search Results for: comparison

In this video from SIGGRAPH 2012, Nvidia’s Ian Williams presents on the VGX Hypervisor, the company’s move to bring graphics to the Cloud.

The new NVIDIA VGX technology allows for true hardware virtualization of the GPU, enabling a true PC and Workstation experience in a virtual desktop environment. This session will cover a comparison of graphics virtualization technologies available in the industry (both SW and HW methods) as well as accelerated remoting solutions.


Read the entire post …

Posted in Cloud HPC, Events, GPUs, HPC, HPC Hardware, Video, Visualization | Leave a comment

Call for Papers: Performance Modeling, Benchmarking & Simulation of HPC Systems Workshop

Search Results for: comparison

The Performance Modeling, Benchmarking & Simulation of HPC Systems Workshop at SC12 has issued its Call for Papers. The event will take place Nov. 12 in Salt Lake City.

This workshop is concerned with the comparison of high-performance computing systems through performance modeling, benchmarking or through the use of tools such as simulators. We are particularly interested in research which reports the ability to measure and make tradeoffs in software/hardware co-design to improve sustained application performance. We are also keen to capture the assessment of future systems, for example through work that ensures continued application scalability through peta- and exa-scale systems.

Papers are due Sept. 9, 2012. Read the Full Story.


Read the entire post …

Posted in Events, Exascale, HPC, SC12 | Leave a comment

Infographic: Sequoia – The Fastest Supercomputer on the Planet

Search Results for: comparison

The HPC 4 Energy Blog brings us this infographic on #1 ranked Sequoia supercomputer.

Lawrence Livermore National Laboratory’s Sequoia supercomputer, an IBM BlueGene/Q system, was ranked as the world’s fastest supercomputer on June 18, 2012. Sequoia boasts 16.32 petaflops using 1,572,864 cores, but how fast can it complete calculations? This infographic puts its speed into perspective, demonstrating the potential of American HPC resources to save organizations time and money.

While the comparison of supercomputer flops vs. handheld calculators is pretty much tired, I think it is helpful to help the layman understand how immense 16.3 Petaflops is compared to machine capabilities of just a few years ago. Download the infographic.


Read the entire post …

Posted in HPC | Leave a comment

DOE Doles Out Cash to AMD, Whamcloud for Exascale Research

Search Results for: comparison

By Timothy Prickett MorganGet more from this author

The US Department of Energy used its massive budget to push supercomputers to gigaflops, teraflops, and petaflops in the prior three decades and it is being tasked to put the pedal to the exaflops metal before the end of this decade.

To get there, the DOE has to fund primary research at IT vendors who might otherwise not get around to it until it suited their own commercial needs. It has to also foster collaboration across vendors who might otherwise rather not share ideas, because no one vendor is going to be able to solve the exascale problem by itself.

The main vehicle for funding exascale computing is called the Extreme-Scale Computing Research and Development program, which is being funded by both halves of the DOE. That would be the Office of Science, which funds scientific research in the nuke labs, and the National Nuclear Security Administration, which runs simulations to make sure the US military’s nuclear warheads work since Uncle Sam can’t set one off thanks to the Nuclear Test Ban Treaty. There is talk that the supers at the DOE labs aren’t just making sure existing nukes work, but also helping to redesign them.

The first phase of the DOE’s exascale system funding is called FastForward, which is being administered by Lawrence Livermore National Laboratory in conjunction with the six other primary DOE nuke labs (some of which dislike being called nuke labs even though they do nuclear physics research).

Those other DOE labs, along with LLNL, are the name brands in high performance computing in the United States: Argonne National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Pacific Northwest National Laboratory, and Sandia National Laboratories.

The FastForward exascale research program issued its request for proposals on March 29, and asked that they be submitted by May 11. The program seeks to fund basic research in exascale computing as it relates to three areas: Memory, processors, and storage and I/O.

It has an explicit goal of trying to solicit cooperation across multiple companies, much like the US Defense Advanced Research Project Agency’s Ubiquitous High Performance Computing program. In a way, the UHPC program at DARPA is the trailblazer for the FastForward program at DOE.

DARPA always first to fight
The UHPC program was announced in March 2010 with the goal of creating an HPC system that by 2018 can do 50 gigaflops per watt (BlueGene/Q, the current top performer and most efficient super in the world, can do a little more than 2 gigaflops per watt) and pack 10 petabytes of storage and do around 3 petaflops of number crunching into a slight larger server rack than is standard and within a 57 kilowatt power budget.

Building an exascale system would seem easier, by comparison, since there is, in theory, no limit on the size of the machine or its power budget. But in reality, there are big-time power limits on exascale supers because no one is going to build a 20 megawatt nuclear or coal power station to keep one fed and cooled.

In August 2010, two teams were awarded UHPC ExtremeScale contracts with a total of $74m: one lead by Nvidia and the other Intel. Nvidia got a $25m grant and has teamed up with Cray, Oak Ridge National Lab, and six universities. Intel teamed up with three universities, SGI, Lockheed Martin, Cray, Reservoir Labs, and ET International to take down a $49m grant.

In three related UHPC grants, Sandia National Lab has teamed up LexusNexus and two universities, MIT has its own grant, and so does Georgia Tech, apparently. Total funding for the UHPC effort is said to be on the order of $100m, but DARPA has never confirmed that figure.

Three steps to DOE-sponsored exascale computing
With the FastForward program, the DOE is setting a cap of $20m on any proposals to try to encourage focused work on specific problems, and said at the get-go that what it was looking for was more like two $10m proposals in each of the three areas of primary research.

It is not clear how many awards have been made yet – the vendors are not notified of who was bidding and who won, but rather that they won. At the moment, AMD has been awarded a FastForward contract for processor and memory research and Whamcloud has one contract for storage and I/O research. There could be – and probably will be – others getting grants. Uncle Sam likes to hedge its HPC bets.

Once the primary research on possible exascale technologies is completed over the next two years, DOE will be looking at funding vendors to put together prototypes – this is tentatively called the system design phase – and then, by 2020, to build full exascale systems based on those prototypes – known as the system build phase at the moment. DOE will no doubt come up with other names later on.

According to the statement of work (PDF) for the FastForward contract, the issues that vendors face on the exascale challenge are daunting.

On a current petaflops-class system today, it costs somewhere between $5m and $10m to power and cool the machine today, and extrapolating to an exascale machine using current technology, even with efficiency improvements, you would be in for $2.5bn a year just to power an exascale beast and you would need something on the order of 1,000 megawatts to power it up. That’s 50 nuclear reactors, more or less. The DOE has set a target of a top juice consumption at 20 megawatts for an exascale system.

Using DDR3 main memory today, a 2 petaflop machine with 2PB of main memory burns about 1.25 megawatts, and assuming that we can get to DDR5 main memory by 2020, we’re talking about needing 260 megawatts just for the memory subsystem in an exascale box. Even if you cut the memory-to-flops ratio by a factor of five, which many people don’t think is a good idea, and you are above 50 megawatts just for the memory subsystems across a cluster.

In addition to power consumption, memory components are not getting as cheap as CPU components, and memory bandwidth is not keeping up with the ever-increasing core count on processors and thus memory latencies are increasing.

There are resiliency issues with all of the components in an exascale system, which will have large numbers of components frying all the time. And then you are going to have billions of compute elements, and there has to be a hierarchy of memory and interconnects to keep them all fed and communicating with each other as simulations run.

Worse still, programming these petaflops machines is a complete bitch, and an exaflops system will be in the range of old battle-axe mother-in-law. Beyond that, you are programming against Death.

On the processor front, during the FastForward phase, the DOE is looking to better measure and control the power use in processors and integration with memory, network, and optics from the CPU or hybrid CPU-GPU chip, as the case may be. On its wish list, the DOE wants automatic rollback after faults or synchronization errors and better fault detection and correction.

Boosting the movement of data onto and off of the chip is also key, as is handling collective operations across compute elements, and software-controlled placement of data on the chip and its memory hierarchy is also penciled in. Putting network interfaces on the processor is a requirement, and so is boosting the concurrency across many cores and many threads on the cores.

The compute elements of the FastForward potion of the project have to provide 50 gigaflops per watt at scale – that’s the same level of performance per watt that DARPA is looking for its ExtremeScale UHPC project. The system has to have a mean time between application failure of six days or larger.

This doesn’t sound so great until you realize the system will have trillions of components and that today, with petaflops-class machines, it is on the order of one to five days and, without check-pointing or other resilience mechanisms will drop to about six hours by 2020.

DOE would like to have compute nodes with more than 10 teraflops of double-precision number-crunching performance, 4TB/sec of aggregate memory bandwidth and more than 100GB of main memory; something on the order of 32GB to 640GB is preferred. Total bandwidth between a node and the interconnect that lashes them together should be in excess of 400GB/sec.

The burden of memory
On the memory front, DRAM failure rates are higher than expected and density improvements in memory chips are not coming fast enough. So DOE wants researchers to explore the use of in-memory processing – literally putting tiny compute elements in the memory to do vector math or scatter/gather operations – as well as the integration of various forms of non-volatile storage into exascale systems.

The nuke labs are thinking that 500GB of NVRAM Of some sort per socket will do the trick. While 4TB/sec of bandwidth is a baseline, DOE really wants 10TB/sec.

Parallel storage subsystems generally hold up better than compute nodes on exascale systems these days, with the DOE estimating that the meantime between application failure due to a storage issue being around 20 days. Without any substantial changes to storage architectures, that will drop to 14 days by 2020. Disk capacity is increasing at a decent clip, but disk performance is not. Solid state drives are fast, but they ain’t cheap.

If availability is not as big of an issue for exabyte-class storage, then scale surely is. That exascale system in 2020 will have between 100,000 and 1 million nodes, and will have somewhere between 100 million and 1 billion computing elements, with somewhere between 30PB and 60PB of memory, and across which some sort of concurrency will have to be provided to run applications.

This behemoth will require from 600PB to 3,000PB of disk capacity. In effect, the disk array for an exascale compute farm will be an exascale system in its own right, with peak I/O burst rates on the order of 200TB/sec and metadata transaction rates on the order of 100MB/sec.

For the FastForward storage research projects, DOE wants a storage system that can keep the fully running exascale system fed, without crashing, for 30 days or more, and the mean time between unrecoverable data loss should be 120 days or higher – and do so with the storage array crammed to 80 per cent of capacity and performing full memory dumps from the system every hour.

Data integrity algorithms for storage can impose no more than 10 per cent overhead on the metadata servers at the heart of the storage array. Metadata insert rates are expected to be on the order of 1 million to 100 million per second, and lookup and retrievals are expected to be on the order of 100,000 to 10 million per second out of the metadata servers.

During peak system writing and reading operations, the metadata servers can’t take any more than a 25 per cent performance degradation hit, and DOE would really like to be 10 per cent.

No big deal, right?

So, good luck, AMD, Whamcloud, and friends.

The winners
AMD received research grants under the FastForward portion of the DOE Extreme-Scale Computing program for both processing and memory research, and according to Alan Lee, corporate vice president for advanced research and development at the chip maker, the reason is because the two are interrelated.

Lee was not able to elaborate much on the research plans AMD has put together, but he did confirm to El Reg that AMD would be focusing on research to push its hybrid CPU-GPU processors, what the company calls its Accelerated Processing Units or APUs. On the memory side, AMD is looking a different types of memory, different structures and hierarchies of memory, and different relationships between these memories and the APUs, and that this will, of course, necessarily involve system interconnect work.

“Moving data around to feed the beast is critical for exascale,” explained Lee, adding that the SeaMicro acquisition earlier this year was not done for this DOE work, but the interconnect expertise that AMD gained through that acquisition would be put to good use.

AMD researchers have already identified a subset of key memory technologies that they think will be applicable to exascale-class systems, and this is what the research will focus on. AMD is not throwing the whole kitchen sink of possible volatile and non-volatile memories into the mix.

Lee was not at liberty to say what memory technologies AMD was looking at – that would be helping its inevitable competition. AMD has received a grant of $3m for the memory research and $12.6m for the processor research. It is interesting that AMD was able to bag these contracts all by its lonesome specifically after the DOE said that it wanted multiple companies cooperating on the work.

On the storage front, Whamcloud, the company that was formed in July 2010 to support and extend the open source Lustre file system, is the leading contractor and is soliciting help from a bunch of others.

Whamcloud is managing the project and lending its Lustre file system expertise and is relying on HDF Group for application I/O expertise, EMC for system I/O and I/O aggregation skills, and Cray for scale-out testing of the storage systems. This exascale storage system will have a mix of flash and disk drives.

The word on the street is that Whamcloud received around $8m for its FastForward grant. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

Read the entire post …

Posted in Computing Research, HPC | Leave a comment

Andrew Jones: A Preview of ISC’12

Search Results for: comparison

Andrew Jones from NAG in the UK gives us a preview of what to expect at ISC’12 in Hamburg.

As at ISC’11 last year (and SC11), I think there will be a strong fight for attention in the key area of manycore/GPU devices – and a matching search for evidence of real progress. So far the loudest voice has been NVidia and CUDA, especially following NVidia’s successful GTC event recently. However, interest in Intel’s MIC (Knights Corner) is strong and growing – MIC has often been a big discussion topic in workshops, conferences and meetings over the last year. As the MIC product launch gets closer, people will be making obvious comparisons with NVidia’s Kepler announced at the GTC.

Read the Full Story.


Read the entire post …

Posted in Accelerators, Events, GPUs, HPC, HPC Hardware, ISC12 | Leave a comment

Supercomputing: From Candlelit Dinners to the House of Lords

Search Results for: comparison

Exascale computers will be here by 2019, according to Hans Meuer, chair of the ISC’12 Supercomputing Conference – although it is currently unclear what technologies they will employ.

In an invited talk in one of the committee rooms of the House of Lords on 18 April, Professor Meuer gave British Peers, and luminaries of the UK computer community, a tour-de-force presentation on the development of supercomputing from the Cray 1 in the 1970s to the advent of exascale.

He emphasised that the demands on high-performance computing are changing and that data crunching is becoming as important a topic as number crunching. However, he said, the conventional tools for assessing the performance of supercomputers – in particular the Linpack benchmark upon which the Top500 listing is based – may not necessarily be the most appropriate measures in such data analysis applications. He stressed that alternative metrics, including Jack Dongarra’s HPC Challenge benchmarks and the Graph500 initiative, were important in assessing machines for specific purposes.

The value of the Top500 benchmark is that it has been applied consistently over a period of nearly 20 years (celebrations of the 20th anniversary will take place in Salt Lake City in November this year). When plotted on a logarithmic scale, the increase in supercomputing power over that period has been a remarkably straight line and he saw no reason to doubt that the trend would continue into the future.

The consistency of the growth in compute power over the period is all the more remarkable as the underlying technologies have changed significantly in that period, he pointed out. ‘For me, the first real supercomputer was the Cray 1 vector supercomputer in 1976,’ he said. But the technology changed to massively parallel architectures, more conventional processor chips and, recently, to include GPU type chips.

Professor Meuer recalled that the Cray 2 was the most powerful supercomputer in the world in 1986. The price tag, of $22M was so high that when one was purchased for Stuttgart, the deal was signed allegedly only after ‘a candlelit dinner’ between the Minister-President of Baden-Wurttemberg and then then CEO of Cray Research, John Rollwagen. For comparison, Professor Meuer said, the Apple iPad2 in 2011 had two-thirds of the processing power of the Cray 2 at a price tag of only $500 – a reduction in price by a factor of 44,000.

He raised the radical question as to whether we need new computer architectures to cope with ‘Big Data’. In traditional computational sciences, he said, the problems fit into memory; the methods require high precision arithmetic; and the computation is based on static data. Recently, interest has grown in data intensive sciences where the problems do not fit into memory; variable precision or integer based arithmetic is required; and the computations are based on dynamic data structures. Such problems arise as a result of experiments such as the Large Hadron Collider at CERN, the European Laboratory for Particle Physics, where the task is analysis (data mining) of raw data from the high throughput instruments.

Looking to the future, Professor Meuer reminded his audience of the perennial problem that to increase the number of transistors per chip, the transistors must become smaller and smaller and so the manufacturing process must be able to define ever-smaller feature sizes year after year. He conceded that the ultimate limits of conventional silicon technology would be reached within the next few decades. Perhaps, he speculated, it would soon be time to turn to more exotic technologies, such as quantum computing. He concluded by citing Mark B. Ketchen, manager of the physics of information group at IBM’s Thomas J. Watson Research Centre in Yorktown Heights, New York, on quantum computing: ‘In the past, people have said, “maybe it’s 50 years away, it’s a dream, maybe it’ll happen sometime”. I used to think it was 50. Now I’m thinking like it’s 15 or a little more. It’s within reach. It’s within our lifetime. It’s going to happen.’

This story originally appeared on HPC Projects. It appears here as part of a cross-publishing agreement with Scientific Computing World.

 

Read the entire post …

Posted in Exascale, HPC | Leave a comment

OSU Grad Student Explores Strengths, Challenges of Cray, IBM, and Nvidia

Search Results for: comparison

When it comes to benchmarks, your performance mileage may vary. Now an Ohio State University researcher has established some side-by-side performance comparisons that surveying the wide range of parallel system architectures offered in the supercomputer market, .

We explore the parallelization of the subset-sum problem on three contemporary but very different architectures, a 128-processor Cray massively multithreaded machine, a 16-processor IBM shared memory machine, and a 240-core NVIDIA graphics processing unit,” said Bokhari. “These experiments highlighted the strengths and weaknesses of these architectures in the context of a well-defined combinatorial problem.”

Read the Full Story.

Read the entire post …

Posted in Compute, GPUs, HPC, HPC Hardware | Leave a comment

PC Battle Royale: Who Will Win the Exascale Supercomputer’s Heart?

Search Results for: comparison

By Dan Olds, Gabriel Consulting • Get more from this author

My article comparing supercomputer performance and price/performance to common computers generated quite a few comments. For those who didn’t see the initial story, the Fujitsu K computer is a 10 petaflop monster that’s currently the fastest computer in the world. It’s roughly 4x faster than the second place Tianhe-1A Chinese system that topped the chart at the end of 2010.

Most of the comments on the home computer vs supercomputer article were the typical mix of humour, flames and thoughtful asides, but one in particular caught my eye. It was from “buzza,” who mused:

The K machine is mighty pricey, and it would (be) interesting to see how that cost breaks down into CPU vs I/O development. The K machine has a very elaborate interconnect. This must surely take a lot of the credit for the machine’s sustained performance being so close to the theoretical peak performance. The cost break down might illustrate where investment pays off best.

The K computer delivers incredible performance but also an equally incredible price tag: at $1.25bn to build and $10m annually to operate.

For comparison purposes, the IBM Roadrunner (the first 1 PFLOP system) cost about $100m back in 2008. So from Roadrunner to K computer, we saw both performance and costs move up an order of magnitude. Fair enough.

Much of the cost behind the K computer was in designing the system innards, primarily the proprietary interconnect and surrounding bits. It was the same thing with Roadrunner; much of the development time/money was spent working out how to get Opteron and PowerXCell accelerators (closely related to the Cell BE chip in PlayStation consoles) to work well together in the same system.

Both of these systems, aside from being the first to cross performance hurdles, are departures from the conventional systems that populate most of the Top500 list and most HPC data centres.

Supercomputers used to be highly customised systems that were essentially built from scratch and shared few, if any, common components (other than copper and electrons). The hardware and operating systems were unique to a particular vendor and even machine type.

All of this changed in the 1990s when increasing HPC demand combined with a number of other factors (including the rise of Linux and the falling cost of commodity parts) to bring about what is now the typical supercomputer: a collection of individually inexpensive common parts that are lashed together to build a massive single cluster or MPP system.

In a lot of ways, this was a grassroots effort fueled by customer desire to get more FLOP/s per dollar, aided by their willingness to roll their own system software and reengineer their apps.

Over time, as the commodity movement picked up steam, it was embraced by existing and new vendors. Building supercomputers out of commonly available parts opened up the industry to lots of new players who were able to build competitive systems (in price and price/performance terms) by “simply” combining commodity parts together.

They’re able to stay on the performance curve by taking advantage of steady gains from processors (ala Moore’s Law) and networking/interconnect technologies. This isn’t to say that building a commodity-based supercomputer is now simple – but it’s a lot easier than having to design and build all of the major components yourself.

The K computer wasn’t built in this mold. It uses SPARC processors, not Intel or AMD procs. While SPARC is a widely used processor, it hasn’t been widely used in HPC since the early 2000s. The K computer team built their own highly sophisticated 6D torus (like there’s a non-sophisticated 6D torus, right?) to connect nodes together, eschewing the typical Infiniband or network-based interconnect. It’s also unique in what it doesn’t use: accelerators (either GPUs or FPGAs). The K computer relies on lots and lots of traditional CPUs, with more than 700,000 cores total.

K isn’t the only throwback system on the Top500. In addition to the aforementioned Roadrunner (which still comes in at #10), there are plenty of top systems that aren’t fueled by x86 processors, including the 14th fastest system in the world: the 800 TF Sunway Blue Light system, which relies on 16-core ShenWei RISC processors.

In terms of system count, almost 90 per cent of the systems on the current Top500 list are based on x86 processors from AMD or Intel. But the 13 per cent of systems that aren’t x86-based pack quite a punch, accounting for 27 per cent of the total performance (as measured by a sum of Rmax ratings).

Many industry watchers, myself included at times, figured that the commodity model would swamp the custom system model sooner or later. While that’s mostly happened, the K computer (along with Roadrunner, Blue Light, and, arguably, the CPU/GPU hybrids like Tianhe) have fought against that tide and made quite a splash, at least in the deepest part of the deep computing pool. (And I’ve just hit my personal best for reuse of the same metaphor – a high-water mark for me!)

So what does this bode for the future? Will commodity rule the roost, or will we see a new crop of custom systems using exotic combinations of at least semi-proprietary parts? I think that the move to exascale is going to require different approaches and technologies – we’re not going to get there by just shrinking and cranking up the frequency of existing parts. This is a situation where turning up the amps to ‘10’ won’t get it done; exascale is going to demand ‘11’. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.


Read the entire post …

Posted in Accelerators, Compute, Exascale, GPUs, HPC, HPC Hardware | Leave a comment

Supercomputers vs. Your Computer – A Bang for the Buck Battle

Search Results for: comparison

By Dan Olds of Gabriel Consulting * Get more from this author

A couple of weeks ago I posted a blog here (Exascale by 2018: Crazy…or possible?) that looked at how long it took the industry to hit noteworthy HPC milestones. Chatter in the comments section (aside from the guy who assailed me for a typo, and for not explicitly calling out ‘per second’ denotations) discussed what these massive systems do and why they’re necessary.

But Reg readers’ comments, plus others that I received via Twitter, raised some interesting questions that I’m going to attempt to answer – or at least sort of answer. The first is: just how much did these systems cost new?

When these systems came out, they were the biggest and baddest supercomputers in the world. But the price tag that the vendor attaches to a system in a press release and the actual price paid by the customer may have little or no relationship to each other or what the system cost to develop and build.

The price also varies depending on when in the product lifecycle you purchase the system. Buying the first one doesn’t mean that you’re necessarily paying the top price. If you’re the kind of customer who might buy boatloads of them, you would probably get a break. It also helps if you’re on the understanding side when it comes to performance qualification and bug fixes. Plus the right customer can validate a design, and that’s worth something to vendors.

supercomputing_no_1

In the table above, I did my best to find representative early-life prices for each system. It was easier to find prices for the later systems than for the CDC and Cray boxes. I found ranges of prices for the CDC and Cray-2 systems, so I took the average of those figures.

The final column adjusts those prices to 2010 dollars to level the playing field. Even though the cost of computing has gone down incredibly (as we’ll see below), the cost of BIG computing – the cost of the fastest system in the world – has increased considerably from the $50m CDC 6600 to the $101m IBM Roadrunner. The K computer is a bit of a special case. The $1.25bn figure supposedly represents the cost of design, development and the actual gear – but I don’t know if it’s an apples-to-apples comparison to the others.

The second theme among readers’ comments was: how do these levels of performance (and associated prices) relate to the systems that we use day in and day out? This required some more Jethro Bodine ciphering time; I figured I’d benchmark some of the systems in our offices and see how they came out.

I wanted to use Linpack, so I first needed to find a distribution that works on our Windows 7 systems here. Yeah, yeah, I know that I should set up a dual boot with Linux and then run a ‘real’ Linpack in order to get better numbers, but I do have a regular day job.

Intel has a downloadable Linpack benchmark here that I put on three of our office systems. After perusing the documentation, I ran through some trial runs with different problem sizes in order to establish a performance range. What I found is that, on our systems at least, using the largest ‘typical’ problem set of 40,000 equations seemed to pull out the best Linpack average and peak results.

Our pal Jack Dongarra, one of the founders of the Top500 list, ran Linpack on an Apple iPad 2 and reported that the tablet hit between 1.5-1.65 GFLOP/s, which is higher than the Cray-2 back in 1985.

In the New York Times story, he also discussed the possibility of clustering iPads into a competitive supercomputer. He didn’t seem to feel that it would be a good price performer when compared to existing supercomputers, something that my research below confirms.

A bang-for-buck comparison

supercomputing_no_2

The table above is sorted in performance order. The yellowish rows represent the Apple iPad and some systems I have hanging off my office/home network. The astounding part of the table is the final column – the cost per MFLOP/s. What jumps out is that my wife’s generic business desktop computer kicks ass from a price/performance perspective. At less than 5 cents per MFLOP/s, it’s well ahead of everything else on the list.

The Lenovo W510 was billed as a “Mobile Workstation” when I bought it, and it’s the fastest laptop I’ve ever used. It’s also the heaviest and the most power-hungry, but with an Intel Extreme edition i7 mobile processor and a discreet NVIDIA Quadro graphics processor, it can handle the video processing I throw at it when I’m on the road.

A Bugatti of my own

The Hydra-1 system is something I’ve been working on for months, and it’s finally ready to come into service. I’m getting old enough to realize that I’ll never own a Bugatti Veyron ($2.4m) or even the much less expensive Ferrari Enzo ($670,000), but I can have the fastest computer in my state (until someone proves me wrong – plus it’s a small state).

I built Hydra-1 over the past several months, and it’s quite the screamer. A very helpful HPC vendor helped hook me up with two Intel Xeon 5690s which, together with 24GB of RAM, turned in a Linpack of 122.68 GFLOP/s at stock clock frequencies. Hydra-1 has some serious liquid cooling built in, so I have plenty of room for over-clocking, which might take the Linpack to 140 or better. I think I’m way under the Linpack theoretical max for the system, but I’m running the benchmark under Windows 7 without any tuning, so I’ll take what I can get.

The liquid cooling almost drove me out of my mind, but will definitely pay off on performance and comfort fronts. I’ve written some blogs about the whole process that I’ll submit to The Reg soon.

The iPad does okay, but not nearly as well as the typical systems in the chart. And the iPad numbers don’t include a 3G wireless plan or a cover – and the cheapest poly cover at the Apple store costs $39, or about 6 per cent of the total system price. And if you go with leather? At $69, that’ll drive your price per MFLOP up by 10 per cent.

The biggest surprise on our chart above? Look at Roadrunner and the K computer. Their cost per MFLOP/s is a mere 10 cents and 13 cents respectively. I ran the numbers again and again just to make sure I wasn’t dropping a zero (or adding one). But the result remains the same – the fastest and most modern supercomputers are much less per MFLOP than their predecessors.

So what have we learned today? First, even though the cost of computing components has dropped radically over time, the cost of building/buying the biggest landmark computers in the world has increased just as radically.

We’ve also seen that today’s home and business computers stack up very well against supercomputers of the past. The iPad 2 pwns the Cray 2 in Linpack, but costs 99.998 per cent less (leather case not included).

We’ve learned that the current chart-topping supercomputer, the K computer, has a competitive cost per MFLOP/s even though it cost something like $1.25bn to design and build.

But my wife’s desktop is the Linpack price/performance king, and now we know that too. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.


Read the entire post …

Posted in Business of HPC, Compute, HPC, HPC Hardware | Leave a comment

Advertisement

Intel Truescale White Paper Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap