CoolIT Systems Steps Up Liquid Cooling Technology at SC14

Print Friendly, PDF & Email

In this video from SC14, Geoff Lyon from CoolIT Systems describes the latest updates to their innovative cooling technology.

The momentum at Supercomputing from vendors and end users for liquid cooling grows annually with the increase in rack density and efficiency requirements,” said Geoff Lyon, CEO/CTO of CoolIT Systems. “We are responding to this demand with our largest display ever of enterprise level liquid cooling solutions.”

Full Transcript:

insideHPC: Hi, I’m Rich with Inside HBC. We’re here in New Orleans at SC14 [chuckles], at the CoolIT Systems booth. And I’m here with the CEO Jeff Lyon. What do you guys got going here with liquid cooling at SC14?

Geoff Lyon: Well this year, like others, we’re showcasing all that we’ve done the past year or so in liquid cooling. We’ve got a lot of progress in adoption, a lot of projects that we’re showcasing here on the booth, and a little bit of new technology.

insideHPC: Okay. Well, why don’t we do the walk through. Why don’t you show us how this stuff works and what you got going?

Geoff Lyon: Sure. Well, we can start right here with some of our– what we call server modules. Each different manufacturer has various different form factors for servers. What we are showcasing here are a few different installations that we’ve done. This one’s actually based on a Dell PowerEdge model that has four blade type nodes in one 2U box. So we’ve got this customized solution here. We’ve got the coolant is circulated into the system gathering the heat from each of the CPUs and then circulated back out again, and then it plugs into a manifold.

insideHPC: Hang on a second. When you say a coolant – is it any antifreeze? Is it water? Is it Fluorinert? What is it?

Geoff Lyon: It is actually a water-based solution with probling glycoolants from anticorrosive and antifungal. Make sure that it’s stable. We don’t get any floaties growing in there over time. It is a nice warm water solution so be careful.

insideHPC: It is important to clean the taps about occasionally so I get it.

Geoff Lyon: This one nice example– we got up here which is a super micro-superblade where we’ve actually got four CPUs in one node, where two coolant connections is enough to supply for four CPUs. You can start to get a sense and a flavor that it just depends on the form factor of the server itself how we apply the solution. It is nice that we’ve got an ability to support not just CPUs but also accelerator cards or even potentially custom ASICS.

I can show you a couple over here as well. This is actually an interesting one. We have an open compute standard one-use server.

insideHPC: The OCP project stuff, yeah, the Facebook stuff?

Geoff Lyon: That’s right. So we work together with Penguin Computing here, and this one is actually quite interesting. It has something called the blind mate connection. This is a little bit different than these. These types are actually plugged in by hand. This one actually automatically gets plugged in just as you slide it into the chassis.

insideHPC: So it’s like a backplane for cooling connections?

Geoff Lyon: Exactly. So no different than you would with networking power, and now cooling.

insideHPC: The sled goes in and everything’s good to go?

Geoff Lyon: Automatic. That’s right. We’ve also got some nice stuff that we’ve been doing here recently here at the show. Nvidia announced that new K80 accelerator card.

insideHPC: Yes. The Nvidia K80 has got twice the GPUs – that’s got to get hot.

Geoff Lyon: It does and something that we developed specifically for that– it actually turned out to be a great project for us. We have enough cooling for both GPUs as well as all of the power handling and memory that’s on the card in one compact form factor.

insideHPC: That’s just made for the K80, you developed this?

Geoff Lyon: The K80, we’ve also got the Xeon Phi, all based on the same technology that we’ve been using like kind of our modular kit of technology. So, inside that K80, just like the rest of our solution is a micro channel heat exchanger mix of pure copper.

insideHPC: So fluid is going over on top of that? Inside there?

Geoff Lyon: Actually we use a patented technology that we developed called split flow. So we drive the cool liquid down into the center of that construct and it splits into two different directions so we get the cool liquid in the middle and it also has the added benefit that the coolant actually can go a little bit slower, it has low flow resistance so that we can support many more processors with very little flow.

insideHPC: Did you happen to use a CFD to get that tweaked the way you wanted?

Geoff Lyon: Absolutely.

insideHPC: Cool, all right. Very nice.

Geoff Lyon: So that’s the cold plate technology. This is what it looks like in final form. It’s only about 15 and a half millimeters tall which allows us to keep out integration burden very low.

insideHPC: Yes, it is not a big profile or anything?

Geoff Lyon: No, it fits inside just about everything with no trouble and there’s flexibility for this too. So we can have the coolant goes in the intake here and then we can come out here, or here, so that we’ve got lots of flexibility when customers are coming to us asking for challenging integration stuff.

insideHPC: Yes, because this thing can’t be eight inches tall because that doesn’t fit inside the server?

Geoff Lyon: That’s right. You got it. So we can go and see what happens after that because all of these solutions plug into the manifold module and then we dissipate the heat. So let’s go have a look.

insideHPC: What have we got going on here, Jeff?

Geoff Lyon: Well, this is an example of our manifold module. So you can see we talked about all of the server technology and how we had the tubes coming out into quick disconnects. So these are specially designed, dry break, quick disconnects that are put together by a partner of ours called Staubli and you can see here that it is a flush mount dry break. So when I say dry break it means it actually is dry. It’s not just drip free. It’s actually dry.

insideHPC: Okay. When it pops it doesn’t spray water? Right?

Geoff Lyon: Exactly. We have to make sure that we are keeping the liquid inside the tubes at all times.
Now, did you get that from hydraulics technology or did you have to develop this yourself?
No. We didn’t develop this. We bought this and developed it in partnership with Staubli. So, they’re the world authority on dry break, quick disconnects. It’s an all metal construction so it’s reliable for hundreds and hundreds of connect, disconnect cycles.

insideHPC: Excellent. Okay.

Geoff Lyon: The liquid– once we have the heat, we have pool liquid going into the server and the warm liquid coming back out on to manifold. Where that goes after that is, we come into what we call our CHX. This is a liquid to liquid heat exchanger. What that means is that we have the coolant from the servers coming into with what we have here is a liquid to liquid heat exchanger. There all of the heat is actually transferred to facility water. We have the coolant coming in, goes into this, goes through the centralized pumps there. We have a reservoir and an accumulator here. Then after it’s cooled, it goes back out to the manifold but in the facility side of the loop we have the cool facility water. Now that’s coming either from a cooling tower or a dry cooler, maybe an adiabatic cooling and that comes in here and gathers the heat from the other coolant circulation and then delivers the warm coolant back out to the facility.

insideHPC: Okay, I want to point out here, Jeff, that what– this device on top, this is a 2U, I believe, form factor?

Geoff Lyon: That’s right.

insideHPC: How much can that 2U device, which is basically a heat exchanger, how many– is this a rack full of cooling or what?

Geoff Lyon: That’s right. So, if we’re doing just one rack or two or three racks, then this is the right solution for that. We have this 2U module, just plugs into the rack and with that, it connects into the manifolds. And it can handle, at an approach temperature of 30 degrees celsius, this can do 40 kilowatts.

insideHPC: Wow, so a major– full rack of 40 kilowatts, that’s a seriously dense rack.

Geoff Lyon: Yes.

insideHPC: Can you walk me through the PSI and through here roughly?

Geoff Lyon: Sure, it’s not a very high pressure system because we have low flow resistance going in here. So the pumps are actually what are driving the circulation through these tubes here and we’re looking at well under ten psi for the entire circulation on the cooling side. On the facility side here, then that depends on the facility. So that could be 40, 50, 60 psi. We want to make sure that that’s actually very well controlled and well regulated. This in face actually is a proportional valve that can go to a full stop. If we end up having some sort of a problem in here with facility water flow, we can shut it off right here at it’s first point as it comes in. It also acts as a proportional valve so if we’re using chilled water, we want to make sure that we don’t allow the coolant to be over-cooled.

insideHPC: Okay right. So this was my next question. What if the incoming water is suddenly is that 33 degrees, you’re going to have to deal with condensation?

Geoff Lyon: To avoid condensation specifically we actually have a humidity sensor and temperature sensor here to understand and calculate due point. And we limit the flow of that chilled water coming through here so that we never get down below due point. We don’t want condensation happening inside servers, that would bad.

insideHPC: But essentially this is a tank– if something happens, its going to pool inside.

Geoff Lyon: It’s design to house the liquid, that’s right or at least a limited amount of it before it gets shutoff.

insideHPC: So, The sensors and your smart software are going to detect something bad is going on and then what happens?

Geoff Lyon: Well in an alert, we get thrown, it will shut the system down, do the best to contain all of the coolant flow and then we can see the– we got a touchscreen on the front of each of the systems here.

That is various different capabilities but it’s monitoring all of the temperatures within alert limits as well it also is going to– so right now we’ve got a pump area that lists because we actually trying to pump off because it’s not a running system and the deep point is okay. Then on top of that we’re also monitoring the high pressure, low pressure, the flow in the deep point, and the humidity and reporting that. It has an ability to log the system and it’s on a TCPIP network so that it can actually send out emails, be compatible with the MC, IPMI, or any other web server infrastructure that the data center might have.

Now we’ve got something that scales up considerably better than this if you’re going to be doing 6 to 10 to 100 racks and we can have a look at that over here.

insideHPC: Let’s take a look. Okay Jeff what do we get here this is a big rack of pipes and stuff. What’s up?

Geoff Lyon: Well this is a CHX 650 so we talked a little bit about the CHX 40 over there which is actually based inside the rack. Now, if we are going to be going to many many racks then we’ve got a much more robust unit here that will be putting about 20 litres a minute or 5 gallons a minute out to the rack. This is capable of doing a little over 350 litters a minute so it has these large pumps down here that you can see, then they are capable of…Only one of them ever runs at a time. I’m sure you’ve always got one as a redundant back up, and actually once a week they switch.

insideHPC: Okay, you want to avoid the thing like an old diesel generator hasn’t been started in 30 years?

Geoff Lyon: We want to make sure that they are having the same wear life and that they’re always in commission and that if one would have failed, we don’t want to find that the other one is not working either. This is an impressive setup and it has all the same sort of sensing technologies that we talked about in the CHX 40 but obviously on a slightly grander scale. You see the two-inch pipe they were pushing out here it has exactly the same functionality. There’s an accumulator tank, there’s a plate heat exchanger in behind that metal box.

And a proportional valve here that is also doing the same due point protection.

So now but we are pushing out an awful lot more coolant. We can see that that coolant is delivered up through these large tubes here. We can have a look from this side perhaps?

So you can see actually it’s insulated inside here in case it’s a chilled water supply. We do the same in there but it’s a little difficult to see anything. Then it goes directly into a large filter to ensure that we are not going to get fouling inside the plate heat exchanger. Then it goes and gathers the heat from the coolant and then discharges it back to the facility.

Now on the coolant side, we’re actually bringing in the one coolant cooling it off obviously and then sending back out the cold coolant and we can see up here that we’ve got the network of coolants distribution going to each rack. You can see the drops coming into the racks. They’re on quick disconnect as well, and then that will supply that coolant into the manifold and then it’s exactly the same as what we looked at over here.

insideHPC: This device– this could cool a pretty good size data center. This one rack with some water fed to it.

Geoff Lyon: Yes and the advantage of this is that takes all that high pressure facility water. We talked about 50, 60, 70 PSI. It’s nice to be able to have this remote and away from the racks.

insideHPC: Okay, so this doesn’t need to be next door amongst all my expensive densely populated servers, does it?

Geoff Lyon: That’s right. This can be remote. It can be even in another room, so that if you want to have that high pressure stuff abstracted from your expensive IT equipment this one strategy in a way to do [?].

insideHPC: Put it on the roof or in the basement or whatever, right?

Geoff Lyon: Exactly, and when we’re delivering out to the various different racks we have options too that if we wanted to have some sort of disaster recovery – if we had somebody back in with a forklift or something like that and it takes a few of the nodes off – you can manually shut these valves off or optionally, there are electronically controlled valves that can be instigated there as well, so isolate on a rack by rack basis.

It just depends on what the customer’s requirement are. Now we have one additional advancement that we’re showing here at the show for the first time. Now we’re pretty excited about.

insideHPC:  All right Jeff, what’s this new thing you were describing earlier?

Geoff Lyon: Well we talked a little bit about over here if we have an incident and we can isolate at the rack level. Cut the coolant supply off, but what we’ve done is brought that down now to the level of the server. So we have leak-sensing cable capabilities here. We can route this wherever we think that there’s a sensitivity or the possibility of a leak. Now, if we were to actually sense a leak, I can just get my fingers a little bit moist here. It doesn’t take very much. A little perspiration and no one’s going to know.
That’s right. So if it sensed a leak…(the hose pops off when he touches the sensor wire.)

insideHPC: Now, you didn’t hit a button there. You just had a moist finger.

Geoff Lyon: That’s right. So if it senses any moisture or there’s a thought that the system believes if there’s a lose of system integrity inside the server, instantaneously what happens is that the quick disconnects here – which are again dry brake – they instantly release. So now, what we’ve done is isolated this zone and you can tell there’s just not very much liquid available to go anywhere and so then we’re limiting the zone of issue to a very, very small amount of fluid. So this would be maybe an ounce or two at the most.

insideHPC: Okay, so that’s what happens mechanically. The water is now stopped from coming in. What happens electronically at that moment?

Geoff Lyon: Well, electronically the signal gets relayed to the control system. A notification gets flagged, and then in the control system that the data center’s using, the operator would be notified to come and have a look and deal with whatever the server issue may or may not be.

insideHPC: The point is that the server won’t keep cooking away here with no cooling coming in?

Geoff Lyon: That’s right. Well, under normal circumstances, if you end up with an overheating condition then the server’s going to shut itself off.

insideHPC: I got you. That’s very exciting. What did you call this again? DCLC?

Geoff Lyon: So that’s our Direct Contact Liquid Cooling, which is all of our technology but we call it the Quick Disconnect Auto Eject.

insideHPC: Like Top Gun, hit the eject button, quick disconnect auto eject.

Geoff Lyon: Yes, we’re probably going to come up with a better marketing name for it by the time we talk next.

insideHPC: I’ll work with you on that. Very cool, Geoff. I want to thank you for sharing this. This is fascinating stuff.

Geoff Lyon: Oh it’s been a fantastic year for us, and we’re really happy to be here. Our customers are giving us some really good feedback, and we’re excited about the installations that we’re doing all around the world.

insideHPC: It’s really exciting to see this all coming together because the future is liquid cooling. You cannot cool 100 kilowatt racks with air, and nobody knows that better than you guys.

Geoff Lyon: Yes, we’re just excited that the rest of the industry’s starting to pay attention to us [laughter].

See our Full Coverage of SC14. * Sign up for our insideHPC Newsletter.