Search Results for: “pue”

Open Compute Project is Open Source for Greener Datacenters

Search Results for: pue

HP’s Marc Hamilton has written an interesting post on Facebook’s new Open Compute Project, a year-long quest to build one of the world’s most power-efficient datacenters right here in Oregon.

“Nearly every HPC data center could benefit from design concepts in Open Compute’s data center specifications for electrical, mechanical, and battery backup systems. Unfortunately, building your own Open Compute server from the specs is probably a lot easier than replicating the data center electrical, mechanical, and battery systems to achieve the 1.07 PUE achieved by Facebook. You can, however, get many of the same data center efficiencies from an HP POD (Performance Optimized Datacenter).”

Read the entire post …

Posted in Green HPC, HPC | Leave a comment

SGI Puts Modular Data Centers on ICE

Search Results for: pue

By Timothy Prickett MorganGet more from this author

In the past, Silicon Graphics didn’t sell containerized data centers — but the company that swallowed it and took its name nearly two years ago, Rackable Systems, did. And the new SGI is now taking its third stab at the idea.

The new ICE Cube Air modular data centers, which SGI is showing off at the Gartner Data Center Conference in Las Vegas, has a few twists on the idea. For example, they’re putting the door on the side of the container instead of the back, where it is normally, which allows the row between racks to be wider. You can see what I mean in this photo:

Ice Cube Air containerized data center

Ice Cube Air containerized datacenter

By making this simple change, the modules now take on more human proportions instead of looking like the guts of a submarine — and giving nerds claustrophobia.

The ICE Cube Air containers, which SGI calls modular data center units, come in three sizes. The smallest is the 8-foot container shown, which has four industry standard IT racks. Up to four of the containers can be linked together to support up to sixteen racks of IT gear. Each module can have up to 148 kilowatts of input power; up to 35 kilowatts per rack are allowed.

SGI has 51U-high racks that fit into the unit, with a roll-in maximum height of 89.25 inches. When you do the math, that works out to a total of 19,584 x64 cores using twelve-core AMD Opteron 6100 processors in two-socket, 1U servers in all of the sixteen racks. If you just put SGI’s InfiniteStorage arrays in those racks, you are talking about 28.7PB of total storage in the four eight-footer ICE Cube Air containers

The medium container is a 20-footer that can hold up to 10 racks and can be clustered into four units for a total of 40 racks. This behemoth has a maximum input power of 371 kilowatts per module, with the same 35 kilowatts per rack as a max. So you’re talking about a maximum of 12,240 cores or 17.9PB of storage per 20-footer — and with four interlinked, that’s 48,960 cores or 71.7PB of disk capacity.

The largest size uses a pair of 20-footer containers as a building module, and then interlinks up to four of these modules together to double up the capacity of those super-tall racks to 97,920 cores or 143.4PB of disk capacity in the whole shebang.

The ICE Cube Air containers have a power-usage effectiveness (PUE) rating of 1.06, which is a little better than the Merlin modular data center that Capgemini announced in Swindon, England, in October, or that Yahoo! opened up a few months earlier with its chicken-coop data centers in Lockport, New York.

PUE measures how efficient a data center design is, and is the ratio of the total power used by the data center divided by the total power supplied to the computing equipment. The typical data centre has a PUE of somewhere between 2 and 2.5, depending on whom you ask. Google says its best data centers run at around 1.10, and the company averages somewhere between 1.15 and 1.25. The ICE Cube Air containers have highly efficient fans and a three-stage evaporative cooling system that can be fed with a garden hose. You can also hook up chillers and only need to pump through about two gallons of chilled water per minute to cool an 8-foot module.

The new ICE Cube Air containers are less mobile than the ICE Cube 20-footer and 40-footer containerized data centers that SGI rolled out in May. And unlike those prior containers, the ICE Cube Air units have an entry price that SGI will talk about, which is $99,000. The ICE Cube Air containers are available now — just in time for holiday gift-giving. ®

This article was originally published at The Register.

Read the entire post …

Posted in Compute, HPC, HPC Hardware | Leave a comment

Red Sky Super Showcases Strength of Sun HPC Designs

Search Results for: pue

Sun Microsystems may be gone, but their large-scale HPC installations continue to impress users with their scaling and efficiency. Drew Robb writes about the Red Sky supercomputers at Sandia National Labs.

A good metric for data center efficiency is Power Usage Efficiency (PUE). You divide the amount of power entering a data center by the power used to run the computer infrastructure within it to arrive at a ratio. The closer you are to 1, the better. The facility has achieved a PUE of 1.27 even when additional enterprise computing equipment beyond the Red Sky/Red Mesa supercomputer is factored in. ”It is outstanding to achieve a PUE of 1.27 with a 43,000 core machine,” said John Zepper, senior manager of computer systems at Sandia National Laboratories

In this time-lapse video, we see the installation of Red Sky, Sandia’s Sun Constellation System supercomputer, which ranked number 10 on the TOP500 at it’s debut.

Read the entire post …

Posted in HPC Hardware, Video | Leave a comment

The Rich Report: The 16 Terabyte PC – SGI Bets on Exascale

Search Results for: pue

It has been over a year since SGI’s merger with Rackable Systems. The two company’s came from different camps, so I was curious to learn about where they are today and where they’re headed in the HPC space. So I caught up with the company’s Chief Technology Office, Eng Lim Goh, to discuss the company’s new products and their plans for Exascale computing.

insideHPC: How long have you been at SGI?

Dr. Eng Lim Goh

Dr. Eng Lim Goh

Dr. Eng Lim Goh: Over 20 years now. I started as a systems engineer in Singapore working on the GT workstation.

insideHPC: As CTO, what does a typical day look like for you?

Dr. Eng Lim Goh: These days I spend about 50-60 percent of my time outside with customers. That’s particularly important now given the fact that we are a new company, with Rackable having acquired us and then renaming the company “SGI.” So I’m going out communicating not only about the new company, but also about the new line including products in the Internet Cloud space, which are less familiar to our HPC customers. And I’m also going out to our Cloud customers who are not familiar with our HPC line and storage lines.

So, it’s a lot of work to bring the community up to speed on both sides–the Cloud side and the HPC side, and that has been going on for a year now. And I think we have come to more of run-rate like scenario now.

insideHPC: That’s interesting. I remember when I first read about the acquisition. I wasn’t familiar with Rackable, so I looked at a corporate overview video that highlighted all their key customers. The list was a who’s-who of heavy-hitter Internet companies like Amazon, Facebook, and Yahoo, and I thought, my gosh, SGI has become the new “Dot in Dot Com” just like Sun was ten or twelve years ago.

Dr. Eng Lim Goh: That’s very complimentary of you to say. In fact our latest win was with Amazon.com with their EC2 and S3 cloud. They’re one of the biggest cloud providers today and we supply the majority of systems to that enterprise.

insideHPC: That brings up my next question. You have these distinct customer segments: the Cloud/Internet providers and the typical big HPC clusters. They’re both filling up rooms with x86 racks, but how do their needs differ?

Dr. Eng Lim Goh: The differences are as follows. On the Internet/Cloud side, they have the same 500 racks of computer systems in their datacenter, but they run tens of thousands of different applications like map reduce, memcacheDB, and Hadoop that are highly distributed. And then on the other extreme, in the HPC world, you may have 256 racks and you may even be thinking of running just one application across all of that. I’m just talking extremes here, of course. There are overlaps, but given these extremes, you see that the needs are different.

On the HPC side, interruption to services on any node in the entire facility can affect productivity. For example, you may have checkpoint restart, but it still takes time to do the checkpoint and then restart. That is, unless the user has intentionally gone into the code to more seamlessly tolerate a node failure while an MPI program is running. So a node failure can be more interruptive to the HPC world as opposed to the cloud side. On the cloud side, their usage makes them inherently and highly tolerant of node failures And as such the focus is different.

Now let’s look at some of the similarities like power. There is one area where we have been learning a lot from the Cloud side to bring over to the HPC side. Their Internet datacenters are on the order 10 to 20 or 50 Megawatts. While in the HPC space, if you talk about a 50 MW datacenter it is considered extreme. So in this sense, I’d say the Cloud world actually scales bigger.

insideHPC: So they are facing a lot of the same challenges in terms of power and cooling. What did Rackable bring to the table in this area?

Dr. Eng Lim Goh: With regards to power and cooling on the Cloud side, one of the key requirements Rackable addressed was efficiency. In the early days, when datacenters were on the order of a Megawatt, customers had power efficiency specifications at the tray level. And then more recently they were set at the rack level. So if they were ordering 400 racks like one of our cloud customers, they stopped specifying at the chassis level and started specifying at the rack level.

So that gave us the opportunity to optimize at the rack level: removing power supplies in every chassis and doing AC to DC conversion in the infrastructure at the rack level. Later, with our CloudRack design, we removed fans at the chassis level as well.  In fact, some Internet datacenters are demanding that those racks are able to run extremely warm, as high as 40 degrees Centigrade, in order to reduce energy consumption on the cooling side.

So then as they move to even larger scales, with Cloud datacenters that run tens of Megawatts, they are moving to the next level up of granularity and specifying efficiencies at the container level. At that level, we essentially have a modular datacenter, and this is where they started to specify a PUE requirement for each container that we ship. Today the standard requirements are on the order of 1.2 PUE, with more recent acquisitions demanding even more efficiency than that.

So on the Internet/Cloud side, yes, the expertise brought by Rackable was to be able to scale with the customer’s requirements as they went from 1 Megawatt to tens of Megawatts and keep up with these datacenter’s demands for higher and higher efficiencies.

insideHPC: You mentioned container-based datacenters. I came from Sun where we never seemed to make hay with our Project Blackbox. How well is SGI doing with it’s ICE cube container datacenters?

Dr. Eng Lim Goh: We have shipped containers to a number of customers and we also have a couple on Cloud providers who are evaluating ICE cubes for wider deployment.

insideHPC: Are the HPC customers interested in containers, or are they still on the fence?

Dr. Eng Lim Goh: This is where I think the combination of the two companies, Rackable and SGI, have a strong leverage because the HPC world is coming up to where the Internet datacenters are in terms of scale. So when we’re talking about Exascale computing here, and they are specifying 20 MW for a future Exascale system, this is something that the Rackable side is familiar with in terms of power. So for HPC, we are actually drawing a lot on our expertise of delivering to Internet datacenters at that scale and at that requirement for efficiency.

For example, say there is someday a HPC datacenter requiring an extreme PUE number of say 1.1, in addition to meeting other Exascale requirements. So we have drawn from the Cloud datacenter side, where they already have such requirements for an air-cooled container that just takes in outside air through a filter to cool your systems. We have one such system now that has passed the experimental stage and is ready for deployment. And in many places in the world, if we built a system that can tolerate, say, 25 degrees Centigrade, you can get free cooling most of the year. However, for those places averaging higher than 25 degrees C, this wet-cooling system essentially uses a garden hose (I’m simplifying it) type connection to wet the filter just like a swamp cooler. Depending on humidity levels, you can get a five to ten degree Centigrade cooling result.

insideHPC: So that brings up another issue. When you have that kind of scale going on, system management must be a huge undertaking.

Dr. Eng Lim Goh: Absolutely. We have hierarchical systems management tools with a user interface to manage all the way from the compute side to the interconnect side and then all the way to facility power consumption. And of course, at the container level, we have a modular control system that handles temperature, humidity, pressure, and outside air. And that modular system feeds upward to the hierarchical systems management tools.

insideHPC: Since we’re talking about big scale, I think we should dive into the new Ultra Violet product, SGI Altix UV, that you announced at SC09. Is that product shipping now?

Dr. Eng Lim Goh: We began shipping the Altix UV a few weeks ago. We now have a number of orders, so there is a lot of interest in the system.

In terms of it’s use, there are two areas in which the Altix UV is of great interest. On the one hand, you have customers who are interested in big, scale-up nodes. You know, with today’s Nehalem EX you can get two, four, and eight socket systems. If you think in that way, the Altix UV scales beyond that eight socket limit all the way to 256 sockets and 16 Terabytes of memory. So that’s one way to look at the Altix UV. The 16 Terabyte memory limit is because the Nehalem core only has 44 bits for physical address space.

So that’s one of the ways of looking at Altix UV. And the reason people buy that, for example, is heavy analytics where they load in 10 Terabyte datasets and then use the 256 sockets, which equates to up to 2000+ cores, to work on that dataset.

insideHPC: And that’s a single system image for all those cores?

Dr. Eng Lim Goh: Yes. It runs as a Single System Image on the Linux operating system, either SuSe or Red Hat, and we are in the process of testing Windows on it right now. So when you get Windows running on it, it’s really going to be a very big PC. It will look just like a PC. We have engineers that are compiling code on their laptops and the binary just works on this system. The difference is that their laptops have two Gigabytes of memory and the Altix UV has up to 16 Terabytes of memory and 2000+ physical cores.

So this is going to be a really big PC. Imagine trying to load a 1.5 Terabyte Excel spreadsheet and then working with it all in memory. That’s one way of using the Altix UV.

insideHPC: Did you develop a new chip to do the communications?

Dr. Eng Lim Goh: Yes. We are leveraging the ASICs chip that we developed. You can call it a node controller, but we call it the Altix UV Hub (HUV). Every hub sits below two Nehalem EX (8-core) sockets. And this Hub essentially talks to every other Hub in every node in the system and fuses the memory in those nodes into one collective. So when the Linux operating system or Windows operating system comes in, it thinks that this is one big node. That’s how it works.

So all the cache coherency is done by that chip in hardware. Even the tracking of who is sharing what in the shared memory system, it’s all registered in hardware on that chip, and that chip carries it’s own private memory to keep track of all these vectors.

insideHPC: So how does this kind of Big Node change the way scientists can approach their problems?

Dr. Eng Lim Goh: This is a brilliant question. Although the Altix UV is a great tool for large-scale analytics, we are starting to see a lot of interest from the scientists and engineers. There are many scenarios, but let me describe to you one scenario.

If you take typical scientists: the chemists, physicists, or biologists, they do research in the labs and write programs on their laptops to experiment with ideas. So they work with these ideas on their laptop, small scale, but what do they do today when they need to scale up their problems? Today what they have to do is either MPI encode it themselves, or try to get computational scientists in from a supercomputer center or university to code it for them and run it in parallel. And this transition takes weeks, if not months.So what we envision is that the scientist will plug into the Altix UV instead of just waiting. The Altix UV will plug into the middle here by giving the scientists a bigger PC; it does not replace the MPI work.

Let’s look at a very common example. If you take a cube model with 1000 grid points in the X direction and 1000 grid points in the Y and Z directions, and then you march this cube 1000 time steps, that would be a 1 trillion-point (Terapoint) dataset. Now if every grid-point was a double-precision number, this will result in an 8 Terabyte dataset.

At this size, you will typically go to MPI. However, with UV you now have an alternative. We can supply a 10 Terabyte PC to run problems like these. My suspicion is that they will still eventually move to MPI as they run more rigorous simulations. So rather than replace MPI, Altix UV gives them a more seamless research bridge as scientists scale their simulations.

insideHPC: What other ways might they use Altix UV?

Dr. Eng Lim Goh: There is another way to use the Altix UV. We envision using it as a front end to an Exascale system. Imagine your Exascale, albeit tight, cluster with tens or hundreds of Petabytes of distributed memory and you’re using Message Passing or some other kind of API to run a large application. Since this system is going to generate massive amounts of data, it would be good to have a head node that could handle that data for your analysis work. You can’t use a PC any more in the Exascale world; you need something bigger.

insideHPC: So there is a lot of talk these days about Exascale in the next eight or ten years. Where do you see SGI playing a role in that space?

Dr. Eng Lim Goh: I think our role in Exascale will be two-fold. The first will be to use this Big PC concept, with 16 Terabytes going to 64 Terabytes in 2012, and use it as the front end to an Exascale system. We would like the next generations of Altix UV to be the front end of every Exascale system that’s out there. Because if you are already spending tens of millions or hundreds of millions of dollars to build an Exascale system, it’s worth spending a little more so that you can get better use and be more productive with the output of that Exascale system.

Another role for SGI is developing the Exascale system itself. And this is where we are looking at providing a partitioned version of the Altix UV to be the key Exascale system.

So let’s look at Exascale systems now: If you look at what the top research priorities are to achieve Exascale within this decade, you can see that in general those are power/cooling as number one; and how do you get an Exaflop with 20 Megawatts? Number two would be resilience; can the Exascale system stay up long enough to at least do a checkpoint? (laughs) And on these two we are looking closely with microprocessor and accelerator vendors.

But the next two priorities are what we are focusing on ourselves: communications across the systems (essentially the interconnect) and usability. As I’ve described on the usability side, we will be looking at the Altix UV as a big head node.

In the communications area, we believe the interconnect needs to be smarter for an Exascale system to work. Why? Because you cannot get away from global collectives for example, in an MPI program unless you code specifically for Exascale applications to avoid it. Many of the applications that try to run on this large of an Exascale system will have global collectives and will need to do massive communications in the course of running the applications.

insideHPC: So how do you propose to reduce communications overhead in an Exascale system?

Dr. Eng Lim Goh: We sat down and worked out that to cut down that overhead, we need a global address space. With this, memory in every node in the Exascale system is aware (through the node controller) of every other memory in the entire infrastructure. So that when you send a message, a synchronization, or GET PUT to do communications, you do it with little overheard.

But I must emphasize, as many even well-informed HPC people misunderstand, that this global address space is not shared memory. This is the other part of Altix UV that has not been understood well. Let me therefore lay it out.

At the highest level you have shared memory. In the next level down you have global address space and next level down you have distributed memory. Distributed memory is what we all know; each node doesn’t know about it’s neighboring nodes and what you have to do is send a message across. That’s why it’s called Message Passing.

Shared memory then is all the way up. Every node sees all the memory in every other node and hears all the chatter in every other node. Whether it needs it or not, it will see everything and hear everything. That’s why the Linux or Windows can just come in and use the big node.

However, with all the goodness of big shared memory hearing and seeing everything brings you, it cannot scale to a billion threads. It’s just like if you were in a crowded room and and tried to pay attention to all the chatter at once even though it is not meant for you. You would get highly distracted.

So if you go to the other extreme to a distributed memory, you sit in a house with sound-proof the walls and shutter the windows. And as such you see nothing and you hear nothing of your neighbors. The only way you can get a communication across is to send a message by writing a letter or email and send it to a neighbor.

So we decided that a global address space is the best middle ground. In that analogy, global address space sees everything, but does not hear the chattering amongst neighbors. All it wants to do is see everything so that it can do a GET PUT directly, do a SEND RECEIVE directly, or it can do a synchronization expediently. So a hardware-supported, global address space is one way to get the communications overhead lowered in the Exascale world. And this is especially important when you’re talking about a billion threads. Imagine trying to do a global sum on a billion threads. I hope we can code around it, but my suspicion is that there will still be applications needing to do it.

insideHPC: I can tell by your voice that you have great passion for this subject. It sounds like the next ten years are going to be very exciting for SGI.

Dr. Eng Lim Goh: Thank you and I believe so. We sit here working with the industry looking at the state of the land, saying that we need to go to Exascale. And at the same time people are realizing that ok, we first have to do R&D on power, cooling, and resiliency. Sure, SGI is there with the others working on these first set of problems, but we have also been focused on alleviating many-threaded communications overhead for 10 or 15 years already. Moreover, we now also have what we think is a practical solution to the usability problem of an Exascale system, with our big PC head node concept. So in summary, we believe SGI can be a major contributor there.

The Rich Report is produced by Rich Brueckner at Flex Rex Communications. You can follow Rich on Twitter.

Read the entire post …

Posted in Cloud HPC, Exascale, HPC, HPC People | 4 Comments

Vuduc wins NSF CAREER Award to make HPC better “by any means necessary”

Search Results for: pue

In early June the NSF announced that Georgia Tech’s Richard Vuduc received an NSF CAREER Award for his work in tuning software to run on parallel systems. From the NSF website

NSF logoThe Faculty Early Career Development (CAREER) Program is a Foundation-wide activity that offers the National Science Foundation’s most prestigious awards in support of junior faculty who exemplify the role of teacher-scholars through outstanding research, excellent education and the integration of education and research within the context of the mission of their organizations. Such activities should build a firm foundation for a lifetime of leadership in integrating education and research.

The name of his proposal, “Autotuning foundations for exascale systems”, attracted my attention and Rich agreed to tell us a little about himself, his work, and this prestigious award.


insideHPC: First, can you tell the readers a little about yourself? What’s the 100 word bio of Rich Vuduc?

Richard VuducRich Vuduc: I am an assistant professor at Georgia Tech in the School of Computational Science and Engineering, which is (Shameless Plug Alert) one of the country’s few full-fledged academic departments devoted to the systematic study, creation, and application of computer-based models to understand and analyze natural and engineered systems. HPC is a major research and teaching focus in this kind of department, because computational scientists often care a great deal about effective use of parallelism in large systems. My research lab, The HPC Garage, is looking at automating and simplifying the analysis, programming, tuning, and debugging of software for emerging and future parallel machines.

On a more personal note, I am Vietnamese-American and my favorite TV show is “The Wire.” For TV skeptics, The Wire is proof that a TV series can be great art!

insideHPC: Looking at your web pages, it seems like you are, well, more fun than most of the profs I remember. “HPC Garage” for example. Is that a conscious effort on your part to engage more creative people, or just a natural extension of your personality?

Vuduc: Thanks, though I don’t know if “more fun” necessarily means “better research and teaching.”

I went to grad school and did my postdoc in the Bay Area, and am greatly inspired by the famous Hewlett-Packard Garage—so, too, is my lab a small team of creative hands-on tinkerers with limited resources and big dreams of building better, well, instruments and “calculators” for scientific advancement.

insideHPC: Your research area is in tools for getting better performance out of high end systems by software methods rather than human intervention. Can you generally describe your work in this area? Is any of it part of a library readers may be using? How does it fit in the context of other efforts, like ATLAS?

Vuduc: Yes, our goal is to simplify the process of achieving truly high performance, “by any means necessary,” if I may pay small tribute to my radical Berkeley roots. Accomplishing this goal might mean giving parallel programmers an auto-magic toaster that makes slow code fast. However, I would also be happy with more modest achievements, like distilling useful new performance principles or practices; making productive programming models fast; or providing more insight into what architectures work for particular interesting and important classes of applications, and why.

People who recognize my name probably know it from my early work in the area of autotuning on a library called OSKI, the Optimized Sparse Kernel Interface, which was developed while I was a graduate student “bebopper” in Jim Demmel’s and Kathy Yelick’s BeBOP group at Berkeley. (OSKI is also the Cal mascot. Go Bears!) OSKI is like Clint Whaley’s well-known ATLAS library, but is for sparse matrices rather than dense ones. The methodology is different in the sparse case, where one might not only tune the code, but also change the data structure at run-time, depending on the input matrix. Sam Williams (LBNL) greatly extended the OSKI techniques for multicore, and Jee Choi, one of my students, has some cool extensions for GPUs. As for sequential OSKI, I know Mike Heroux at Sandia has an effort to put wrappers around it for Trilinos.

These days, my lab is looking at autotuning techniques for a broader variety of interesting irregular and highly-adaptive computations, both in statistical machine learning (jointly with Alex Gray at GT) and for tree-based n-body problems (jointly with George Biros, also at GT).

insideHPC: Thinking specifically about your CAREER award, could you briefly talk about the award, what it is, what it means for you professionally, and what it means for you personally.

Vuduc: The CAREER award is an angel investment! I am extremely grateful that there are people willing to take a chance on my lab’s work and on my teaching (probably a bigger risk, the latter). Receiving the award means I have both the duty and the privilege to do something impactful.

It’s also a nice nod to my senior faculty mentors at GT, David Bader and Richard Fujimoto. Their efforts and advice have not been lost.

insideHPC: Your proposal is called “Autotuning foundations for exascale systems” — can you talk about the work you plan to do?

Vuduc: In perhaps overly basic terms, we hope to simplify programming and tuning on future exascale systems using autotuning techniques.

The proposal has two major research thrusts, one that explores analytical and statistical performance models to guide tuning, and another that explores tuning in emerging dataflow-like programming models. In both cases, we want methods that work on (a) the kinds of sparse, irregular, adaptive computations that I’ve been studying for some time now and that are a particular challenge to scale; and (b) the kinds of systems we can expect to see at exascale, which I am told will have “absurdly heterogeneous manycore nodes.” Both thrusts build on collaborations with Kath Knobe and C.-K. Luk, both at Intel. If we are successful, we will contribute to a goal that folks like David Bailey (LBNL) and Robert van de Geijn (UT Austin) sometimes refer to as one of developing a “science” of performance programming and engineering. That’s what “foundations” refers to.

Like all CAREER proposals, there is also an integral educational thrust tied to the research. In my case, the gist is to design and implement a year-long lab practicum, called The HPC Garage Practicum, that is a true interdisciplinary team-based competition, aimed at early-stage graduate students. The competition is to develop the most scalable code that answers real-world scientific questions; think of the famed Gordon Bell and X-Prize competitions. The basic inspiration arose in conversations with Pablo Laguna, Deirdre Shoemaker, and George Biros at GT. The approach is in the style of the GT School of CSE’s mission, to train the next generation of computational scientists in interdisciplinary teamwork.

By the way, if any corporations would like to donate prizes for the winning teams in this effort, we are soliciting.

insideHPC: Is this work a “scaling up” of the earlier work you’ve done, or are there specific things that you’ll need to change to address the challenge of running on exascale class machines?

Vuduc: We are scaling up, but not just the platform—-we are also working in larger “algorithmic contexts.” I mean that whereas my earlier work focused on relatively compact kernels, my lab these days is looking at autotuning progressively more complex multiple-kernel solvers, with an eye toward large applications. This requires working more closely with domain scientists and compiler people, like my former postdoc mentor, Dan Quinlan at LLNL. The work my students, Aparna Chandramowlishwaran and Aashay Shringarpure, have done for the fast multipole method on multicore- and GPU-based distributed memory systems is a great first example.

insideHPC: How do you go about designing software for a class of machines that not only hasn’t been built, but for which there isn’t even a design consensus yet?

Vuduc: It’s always a difficult problem, but a “classical” approach is to change the program representation, as suggested, for instance, by Jeff Bilmes (UW) and Krste Asanovic (UCB) in their PHiPAC project. In particular, rather than writing a specific program, you write a program generator that can produce many different versions of the program. The generator might encode the generation of entirely different algorithms. Perhaps the most aggressive and successful examples of this approach today are the SPIRAL (Markus Pueschel at CMU) and FLAME (van de Geijn) projects. It’s not easy to do this but is in my view a promising way forward.

In my CAREER proposal, part of what we plan to do is work with Kath (Knobe at Intel) to use her Concurrent Collections (“CnC”) programming model as a base platform, in part because it embodies the spirit of this approach. More specifically, CnC has a nice way of representing “all possible parallel execution schedules,” from which we could then imagine tuning or searching to find an especially good one for a particular system. Aparna’s IPDPS’10 paper (Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures, Aparna Chandramowlishwaran et al.) — a “best paper” winner, by the way! — shows off some of our early and successful experiences with CnC.

It also seems clear that, in yet another 80s comeback, vectorization is re-emerging in its importance. Think much larger SIMD/SSE units. My student, Cong Hou, is thinking about the problem of autotuning in that context as well.

Read the entire post …

Posted in Computing Research, Featured Stories, HPC Education and Training, HPC People, HPC Software, Tools | Leave a comment

The Green Grid juices PUE datacenter measure

Search Results for: pue

The Green Grid has recently updated its PUE metric (Power Usage Effectiveness) that attempts to wrangle some of the uncertainty in the prior definition of the measure. Ted Samson has a nice analysis

The Green Grid logoOne of the greatest strengths of the PUE metric, the industry standard for measuring data center energy efficiency, is its simplicity: Calculate how much energy your data center is consuming overall, then divide that number by how much energy your IT equipment alone consumes.

…At the same time, the simplicity has its shortcomings. For example, it gives operators much flexibility as to where to measure consumption — at the PDU or at the point of connection of IT devices — as well as how often to take measurements.

…In an effort to overcome this drawback, The Green Grid has unveiled four categories of PUE, ranging from Category 0 to Category 3, in a new white paper, “Recommendations for Measuring and Reporting Overall Data Center Efficiency” [PDF]. With each level, the measurements become more granular and the results more precise. Thus, a data center operator may choose to go with Category 0, which requires the least effort and fewest resources — but then those results won’t be viewed in the same light as a rival’s Category 3 PUE figure.

More in Samson’s article; you can also read what The Green Grid itself has to say about the topic in the related whitepaper.

Read the entire post …

Posted in Datacenter operations | Leave a comment

EPA finally unwraps labeling program for data centers

Search Results for: pue

This week the EPA formally initiated its long-discussed Energy Star labeling program for data centers and the buildings that house data centers.

Energy Star logoTo earn the label, data centers must be in the top 25 percent of their peers in energy efficiency according to EPA’s energy performance scale. By improving efficiency, centers can save energy and money and help fight climate change.

EPA uses a commonly accepted measure for energy efficiency, the Power Usage Effectiveness metric, to determine whether a data center qualifies for the Energy Star label. Before being awarded the Energy Star, a licensed professional must independently verify the energy performance of these buildings and sign and seal the application document that is sent to EPA for review and approval.

As far as I know the PUE is a product of The Green Grid, and this is a major policy victory for that group (they have an overview presentation of the Energy Star program for data centers here).

Read the entire post …

Posted in Datacenter operations, Green HPC | Leave a comment

LSU’s CCT introduces undergrads to computational science

Search Results for: pue

LSU’s Center for Computation and Technology (the home to HPC Rock Star Thomas Sterling) is doing Good in the world this summer by introducing a load of undergrads to the noble art of computational science. The program at the LSU campus in Baton Rouge began on May 31 and will run 9 weeks

LSU LogoDuring this program, 15 college students from Puerto Rico, Illinois, Ohio, New York, Florida, Arkansas, Missouri, Michigan, Pennsylvania and Louisiana will collaborate with CCT faculty and staff on advanced computational research projects.

…During the summer, these students will work with the faculty and research staff at CCT,
learning how to use the cutting-edge cyberinfrastructure on campus to examine various science phenomena such as gravitational waves that result from colliding black holes, explore new materials for energy storage or revolutionary electronic devices, or to design new kinds of physical interaction devices to extend computer visualizations.

The NSF’s REU program is a venerable institution. I remember helping out with REU’s when I was a grad student at MSU’s ERC for NFS back in the early 90s.

Read the entire post …

Posted in HPC Education and Training | Leave a comment

Tokyo Tech talking about plans for 2.4 PFLOPS TSUBAME 2 this fall

Search Results for: pue

This week Tokyo Tech announced the winner’s of its competitive process to build the 2.4 PFLOPS successor to its flagship TSUBAME HPC system

The procurement process concluded on May 25, when the NEC-HP partnership’s winning bid was announced. The theoretical maximum performance of the system is 2.4 petaflops, currently the world’s fastest, improving by 30 times the performance of TSUBAME 1.0. The new supercomputer will be 12 times faster than Japan’s current fastest, which is operated by Japan’s National Atomic Energy Agency.

Some of the features of the new system include GPUs and SSDs, which Tokyo Tech hopes will make the machine both efficient and green (the press release indicates they are shooting for a PUE of 1.277)

The TSUBAME 2.0 supercomputer is equipped with cutting-edge technologies such as the latest Intel Westmere-EP and Nehalem-EX processors with “scalar operation,” and will employ approximately 4,200 NVIDIA Fermi GPUs. This “mixed scalar-vector architecture” will achieve world-class computing.

The system has more than 1,400 compute nodes and uses Voltaire’s QDR InfiniBand network. It uses the latest SSD technology and high-density mixed technology for the world’s fastest total data I/O performance at 0.66 terabytes using DataDirect Networks storage technology.

I assume that’s .66 TB per second.

Interestingly, the system will run a mix of Windows HPC Server and Linux, and will use virtualization “to take advantage of the flexibility of cloud hosting services” — thank goodness they managed to wedge “cloud” in there.

Read the entire post …

Posted in Collaborations, New Installations | Leave a comment

SGI goes “universal” with new compute trailer, adds bring your own gear option

Search Results for: pue

This week SGI announced a new twist on their racks-in-a-trailer solution, ICE Cube ‘Universal’. In the bad old days before the Universal ICE Cube, you were fairly limited with what you could get SGI to slot into their modular computing solution. With today’s announcement, however, not only can you get all the server-oriented kit that SGI makes in a trailer, you can also bring your own

SGI containerUniversal containers open the door for ICE Cube to easily support all SGI server and storage systems, including Altix ICE, Altix UV, Rackable, COPAN and InfiniteStorage lines, in addition to heterogeneous, third-party systems. The company’s long-established Dual Row class of ICE Cube has also been enhanced to better support heterogeneous equipment.

“Our new Universal ICE Cube data centers extend SGI’s leadership in modular data center innovation, allowing us to offer our customers greater flexibility in design and deployment,” said Rick Chapek, SGI senior vice president of hardware engineering. “SGI can now offer targeted ICE Cube configurations across vertical markets that span technical computing, federal government and defense, oil and gas, and Internet, meeting customer specific application and deployment needs.”

The speeds and feeds of your fully loaded container will obviously vary with what you put in it, but SGI says you can get up to 46,080 cores and 29.8 petabytes (PB) of storage in its Universal Class container, and get your PUE down to a limbo-contest-winning 1.12. And SGI says that its added two new Dual Row ICE Cube models built for Rackable’s enterprisey half-depth servers.

I’ll say this for SGI CEO Barrenechea: the man knows how to make the inside of a shipping container look wicked cool.

Now, as to who is buying them? Different story. I can’t get anyone who makes these things on record with hard numbers. The best I’ve gotten officially is “less than 70″ or “the low tens of units” sold by individual companies. When I bring the topic up, whatever company I’m talking with always says that the market is still maturing, but they see a lot of promise “on the horizon.” The argument is that people will move to containers with a frequency that is closer to how often they build out a datacenter, rather than a typical IT refresh. If that’s right, that’s a pretty long time to spend waiting for someone to buy your product.

If you’re hankering for a Universal Class container, however, SGI says you’ll have to wait until Q3.

Read the entire post …

Posted in Business of HPC, Datacenter operations | 4 Comments

Google tips for getting to a PUE of 1.5

Search Results for: pue

Ted Samson writes about comments that Google Green Energy Czar Bill Weihl made at last weeks’ GreenNet 2010 conference in San Francisco.

The most interesting tip Weihl shared pertained to power infrastructure. Whereas most companies use large PDUs (power distribution units) to provide backup power for their data center hardware, Google instead equips each server with a 12-volt battery. Google claims this approach is far more efficient. A large UPS is around 92 to 95 percent efficient, whereas Google says batteries help it achieve better than 99.9 percent efficiency.

Samson also recaps other Google comments which, in general, weren’t all that revealing. Strip superfluous components from server boards, hot aisle/cold aisle isolation, raise inlet temperature closer to rated levels, and think about free cooling. I suspect that Google is keeping the good stuff, things that would get you down to 1.1, is being treated as a state secret in the ‘plex.

Read the entire post …

Posted in Datacenter operations | 1 Comment

New Yahoo! datacenter borrows chicken coop technology

Search Results for: pue

This week DataCenterKnowledge.com is running a great article on a Yahoo!’s new datacenter design, cleverly called the Yahoo! Computing Coop. I say “cleverly called” because the Yahoo! team really did borrow design ideas from the way Tyson Foods builds its chicken coops

“Tyson Foods has done research involving facilities with the heat source in the center of the facility, looking at how to evacuate the hot air,” said Noteboom. “We applied a lot of similar thought to our data center.”

The resulting datacenter has a PUE of 1.1, which means that just about all of the energy going into the facility is going to the computers themselves. How do they do it?

The Yahoo Computing Coops are prefabricated metal structures measuring about 120 feet long by 60 feet wide. Each of the three coops has louvers built into the side  to allow cool air to enter the computing area. The air then flows through two rows of cabinets and into a contained center hot aisle, which has a chimney on top. The chimney directs the waste heat into the top of the facility, where it can either be recirculated or vented through the cupola.

That design, along with putting the Coops in Buffalo where they get free cooling for all but roughly 9 days each year, is saving them real money

“We are at less than 1 percent of our (energy) cost to cool,” he said. “For every dollar we’re spending, we’re spending one cent to cool.” That’s down from more than 50 percent in some earlier data center designs used by Yahoo.


Read the entire post …

Posted in Datacenter operations, Green HPC | Leave a comment

Container computing road show

Search Results for: pue

Enterprise Control Systems announced this week that they are taking their (compute) trailer to the open road as part of the 2010 Containerized Data Center Road Show Tour

Enterprise Control Systems announced today the schedule for their 2010 Containerized Data Center Road Show Tour that will showcase a truly vendor neutral containerized data center.  The event, which will stop at major cities throughout the Western United States, will strive to show data center operators how they can improve energy efficiencies over traditional data center designs by deploying containerized data centers.

The event will include a tour of an actual containerized data center and will include informative sessions on the importance of Power Usage Effectiveness (PUE) and ways to improve overall PUE. PUE is a metric used to determine the energy efficiency of a data center. PUE is determined by dividing the amount of power entering a data center by the power used to run the computer infrastructure within it. PUE is therefore expressed as a ratio, with overall efficiency improving as the quotient decreases toward 1. PUE was created by members of the Green Grid, an industry group focused on data center energy efficiency.

The tour will stop in CA, CO, OR, and WA. For more info, hit the event web site.

Read the entire post …

Posted in Datacenter operations, Events | Leave a comment

What to read at insideHPC this week

Search Results for: pue

Wondering what to read at insideHPC? Some of the most popular posts this week are:

If you aren’t subscribed to our email updates already, your friends are probably pointing and laughing at you behind your back. Show them you aren’t hopelessly behind on every day’s HPC news by signing up for daily email digest.

Read the entire post …

Posted in Admin | Leave a comment

New international agreement on PUE may get momentum going

Search Results for: pue

Enterprise IT Planet’s Green blog has a post up this morning about an international agreement reached over the weekend that may help build momentum for a common vocabulary when it comes to measuring datacenter efficiency

The Green Grid logoOver the weekend, news emerged of an international agreement to establish data center energy efficiency. The agreement is between The Green Grid (U.S.-based industry group), the U.S. Environmental Protection Agency, the European Commission Joint Research Centre, the Japan Ministry of Economy, Trade and Industry, and the Green IT Promotion Council (Japan-based industry group). For now the agreement is limited to these three regions, but could expand to include others such as China and India in the future.

The agreement establishes the Power Usage Effectiveness (PUE) as the “preferred energy efficiency metric.”

Included in the agreement is a set of guiding principles, but notably missing are rules about how total energy should be measured. According to the post, a task force is working on these details.

Read the entire post …

Posted in Datacenter operations, Green HPC | 1 Comment

Advertisement

Penguin Computing Ad

Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap