In this special guest feature, Torsten Hoefler from ETH Zurich writes that the new Green Graph500 aims to boost energy-efficient Big Data Computing.
“Big Data” can be analyzed in various ways. The most successful and prevalent programming model, MapReduce, convinces by its flexibility toadapt to hardware performance variations and faults. However, even though MapReduce covers a huge majority of use-cases, it has its limits for graph computations. Complex graph algorithms become more important as our analysis capabilities grow. For example, problems such as finding hubs in social network graphs are routinely answered today. The underlying algorithm, betweenness centrality, utilizes a graph traversal similar to breadth first search or shortest path search. Systems such as Google’s Pregal, Apache’s Giraph, the (Parallel) Boost Graph Library, and Stanford’s GPS are just some examples for emerging frameworks to handle large-scale graph computations. In order to efficiently compare architectures and possibly programming frameworks, the Graph 500 benchmark strives to establish a database for performance of a standardized breadth first search on various platforms.
As energy is becoming a bigger concern than hardware purchasing costs in large-scale data centers and supercomputing centers, it becomes mandatory to not only consider the performance of such computations but also their exact energy consumption. In fact, if the current cost trends continue, then energy consumption will soon be more important than absolute performance. Such discussions are highly relevant for operators of large data centers such as Google, Amazon, and Yahoo, as well as large supercomputing centers operated by the DOE (e.g., LLNL, Sandia,LANL, ORNL) and the NSF (e.g., NCSA, SDSC, PSC). We are thus looking forward to interesting future developments targeting exascale as well as Big Data architectures and programming frameworks.
We introduce the Green Graph 500 list which fulfills a variety of purposes. First and foremost it is to establish the practice to compete not only for the highest performance but also for the highest energy efficiency, directly benefiting society. It is also set out to collect historical data about developments that may allow us to predict future trends very similar to what the top 500 list has achieved in the past(who doesn’t like to put up a top 500 slide to project out FLOP rate for the next 10 years?). The list will also allow us to compare the energy efficiency of a specific computer for certain tasks, e.g.,dense linear algebra (a problem mainly limited by memory size and CPU peak floating point performance) versus graph search (a problem mainly limited by memory access rates and global system bandwidth). Those two metrics together may serve as a measure to generate more efficient balanced systems as well as special-purpose systems for one of those tasks.
Finally, the new Green Graph 500 list is not meant to compete with any of the existing lists. It is indeed complementary, filling an important gap in the field. In fact, the rules are designed to be similar to the established Green 500 rules (similar, not identical, for example with regards to the network) so that comparisons can easily be made in the future. It also directly integrates with the Graph 500 list and submission system to guarantee one-to-one comparisons (a submission record may be in the Green Graph 500 as well as the Graph 500 even though the lists are ranked by different indices).
The Green Graph 500 list is soliciting submissions from everyone through the Graph 500 submission system. To submit to the list, simply start a normal Graph 500submission and select “Submit to Green Graph 500″ or “Submit to both lists”. The only additional data you need for a Green Graph 500submission is the actual power draw of your system during the benchmark.
Another small difference between Graph 500 and it’s Green peer is the measurement methodology. Since most power meters are not accurate enough to measure the rather short actual BFS run (not including the post-check etc.), we offer a slightly modified version of the reference benchmark which allows to run the BFS in a tight loop long enough for a low-time resolution energy meter to measure the exact energy consumption. This benchmark will also report a Graph 500 number valid for submission. For runs with a custom implementation, this would need to be ensured manually (4-5 lines of C Code suffice for this). The submission opens together with the official Graph 500 submission.
As a sneak peek, we prepared a sample list from March 2013′s energy submissions (which may not have followed all the official rules, thus, the list is not official).
The Green Graph 500 list is maintained by Torsten Hoefler from ETH Zurich in collaboration with the Graph 500 executive committee. For questions or comments please contact [email protected]