Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Podcast: Using Allinea Programming Tools to Speed XSEDE Supercomputing Research


xsede logoIn this podcast, Vincent Betro from the University of Tennessee National Institute for Computational Sciences and Mark O’Connor from Allinea discuss how the XSEDE program uses Allinea parallel programming tools to speed code on their network supercomputing resources. Plans are now in the works to include training on Allinea tools at the 2015 XSEDE Conference.

In my humble opinion, I think that debuggers and profiling tools are far too infrequently used. And it’s not because they’re not there. It’s because people just either don’t know about them, don’t do training on them, or don’t know how to use them. We’re in a state where we have less cycles than we’ve ever had per request, right? So being able to take full advantage of those cycles by having optimized code and optimized run patterns is crucial. Otherwise, you’re just not going to be able to get your work done and the science won’t get done.

Download the MP3 * Subscribe on iTunes * Subscribe to RSS

Full Transcript:

insideHPC: Welcome to the Rich Report a podcast with news and information on high-performance computing. Today my guest is from Oak Ridge we have Vincent Betro. He is a trainer at NICS, and we also Mark O’Connor from Allinea. So gentlemen, we want to talk today about Allinea tools and things that are going on at XSEDE, and I understand that you have a conference coming up. XSEDE, of course, is using Allinea tools for developing parallel code.

Vincent Betro, NICS

Vincent Betro, NICS

Vincent Betro: Yes. Well, effectively most of the major partner sites in XSEDE have Allinea deployed. I mean XSEDE is a virtual organization. It’s kind of like XSEDE doesn’t do anything. XSEDE just sits over top of those [chuckles], you know?

So yes, XSEDE is a– many of our major partner sites including like Stampede and Blue Waters, they all have Allinea. And those are some of our premier computers and so one of the things that we– I was speaking to Mark, and David, and Rick about at SC was we have our XSEDE 15th Conference coming up in St. Louis in July. And one of the things I would like is to have the Allinea folks come down and basically give a tutorial sort of a train the trainers. Because a lot of the people that come to this conference are the people that they’re in, go out to the community, and work with people and do extended support, and all that kind of stuff. And so this is a good group to get your message out to. On top of that, you have the heads of each of the different service provider sites there, and they’re the ones that buy there software.

So in my humble opinion, I think that debuggers and profiling tools are far too infrequently used. That is something– it’s not because they’re not there. It’s because people just either don’t know about them, don’t do training on them, don’t know how to use them. We’re in a state where we have less cycles than we’ve ever had per request, right? So being able to take full advantage of those cycles by having optimized code and optimized run pattern is crucial. Otherwise, you’re just not going to be able to get your work done and the science won’t get done.

insideHPC: Is this going to happen, Mark? Are you guys going to go to St. Louis?

Mark O'Connor (left) and Rich Brueckner (right)

Mark O’Connor (left) and Rich Brueckner (right)

Mark O’Connor: Yeah, I’m sure. I’m sure we will. We love doing stuff like this. I’d say probably we spend as much time doing train the trainer events as we do direct training events. It’s such an effective way of reaching through to people. I think we really sort of kicked off doing putting much more emphasize on training a couple of years back when it really started. I found that we, as an organization, got so much more out of it because you get so much feedback on the– it’s the detailed usability of the product, how somebody knew to debugging or profiling will see this good. The barriers that people have that will stop them even going past stage one because there’s something on the window that entered the stance. You’re always fighting against the little X in the top right corner. That’s your main competition because that’s what everybody wants to click on and they’re going to go back and do it the other way. They promised to give it a go and they give it a go and they’re going to stop. The way we see things now is our main competition is just people clicking on the X and going back to however they did things before, be that print F or bugging the support guy until they fix it for them or whatever.

insideHPC: Okay. So Vince, what would you say that you want to achieve for XSEDE? I mean get people to use the tools that are already waiting for them.

Vincent Betro: Yeah, that is the A number one purpose of all training is to say, “Here are these tools. You obviously have the basics down but this is sort of just in time approach to get you up to speed with the best, and the brightest, and the newest options that are out there. Maybe ten years ago, you weren’t very keen on the parallel debuggers that were running around and a lot of development has occurred since then and we need to kind of revisit that.” Print F is just not the only way anymore.

insideHPC: Right. How important would you say is that wide use is to the XSEDE community with all these parallel resources? If they don’t use them, what is the opportunity cost there

Vincent Betro: I think the opportunity cost is significant. The opportunity cost really– in debugging, it’s harder to put a cost on that because obviously, if it doesn’t work, it doesn’t work [laughter]. It’s a huge opportunity cost. But it’s more of I look at things like the MAP and the different profiling options that Allinea has. Those not using those, literally can cause– I’ve seen tweaks based on– maybe a better way to put this would be to say, “I have seen tweaks in performance based on profiling tools, such as MAP that has yielded up to 25 and 30% shorter run types.” And if you think about that, if you’re talking about something or the run time is 24 hours, you can get a lot more runs in [chuckles], if you only have to go 16, which means you can test more cases, and you can get better information, and the science is advance through that. So really, it’s just about the speed. How fast do we want to advance science? And then secondarily, again, like I said earlier, the resources are at best level and the demand is growing every day. And so, in order to make sure that we have the most efficient use of those resources, we want people to be running their code at its top possible speed so that we can get more people on and off.

insideHPC: Are we talking about idle course as well for something like this? Or does the job scheduler take care of that?

Vincent Betro: Actually, that’s an interesting point. So, the job scheduler does take care of a lot of that. I mean we do have some idle– I mean like a lot of our machines are up in the 90% range of utilization, so I mean that’s good for what you would normally want from a supercomputing center. One of the things that this will allow people to do – if you can refactor your code, there are ways to go about using less notes. There are ways to go about being able to handle a different partition on a machine when you don’t need the one that’s very high throughput only has one processor with two links to memory and they’re having memory bandwidth issues. We want to be able to see what sort of things we could tease out of the codes, so it would run wherever, whenever, however. That’s another way that you’re working towards over the grid infrastructure even for that matter, which is a little out of the purview of XSEDE, but not totally. If you want to be able to get as many people onto these grids as possible, you want them to have flexibility in their topology. And you don’t know how to have flexibility in your topology if you don’t know what you topology is [laughter].

insideHPC: That makes sense. Mark, I wanted to ask you about that on that same thread. You have that thing going on with energy efficiency. Does that come to play with XSEDE or do these guys not care, do you think [chuckles]?

Mark O’Connor: I guess that’s really one for Vince [laughter]. As far as recent energy efficiency, it’s still very much at the stage that there are a few science that are really starting to look seriously at energy efficiency. If you look at – I’ll just take an example out of that supermarket down in Munich. They’re sort of warm water cooling, being very sort of energy conscious in the way they go about adopting supercomputing and also reporting back on it, something about this. Right? On the flip side, I think they actually are the scientific user who’s given an allocation. I doubt that he cares about energy efficiency or that allocation. There’s no particular reason that he should do. The movement on that thing is largely coming not just from the community of sites that deploy supercomputers, but really also from vendors, because most of the large procurement that we see going out now have got power budgets. And if you are shipping machines and tools that help people to achieve their science with a lower power consumption, then you’re getting much more computer out for the same power budget.

insideHPC: Yeah. That makes sense.

Mark O’Connor: Yeah. It’s quite a deep topic that’s touching lots of different levels in the organizational hierarchy.

insideHPC: Okay. Well, Vince, I wanted to circle back to scientific achievement here. It made sense what you’re saying in like if they get to do more runs, it would seem to make more sense to me. That’s better science, right? More accuracy. Is that the end goal?

Vincent Betro: That’s one of the end goals, for sure. To real quickly bounce back to what Mark just said, from the XSEDE perspective, the energy efficiency is all about the power budget, and we do have, as service providers, when we get a hardware award, we’re given 20% of that award per annum to put towards power and support for keeping the resource up. And so the less we have to put towards power, the better.

So the end product, that’s just true all around. But as far as the science, with high performance computing, it really always falls into one of two categories. One is you get more science done as in you can do more runs of some simulation that gives you a better feel for the actual physical world response to certain stimuli, and you learn from that. The other and equally important option is there were some problems, and still are, that are so big as to be intractable unless you absolutely, positively squeeze every minute amount of power out of that machine that you can get. Things like, for instance, I’ve worked with guys that are doing incredibly high Reynolds number CFD calculations, which is basically extremely fast turbulent flow. And they have huge grids, very fine resolution, and to run something like that and get some kind of physical result that means anything, it was taking this one gentleman over 24 hours, which was more than the wall clock time he was allowed, to even get a solution at that level. And until we were able to go back and optimize, he was running at capability on Kraken, at the time when one of the biggest machines that was there, and he couldn’t finish his problem. And so that becomes a serious sticking point. It’s really one or the other. It’s either a capacity issue or a time issue.

insideHPC: Yeah. That makes sense. Well, to come back, let’s just come back a little bit here. It talks about saving machine time so let’s say not in that big hero problem. But if somebody can get their work done quicker, doesn’t that make room for other scientists to get in there and do something useful as well?

Vincent Betro: Definitely, yeah. That’s really the point. It’s not even just one scientist that can make more runs but if people are using the computer to its fullest efficiency, more people can use the computer. That’s just– hate to say common sense but it is. It actually works out the way you would expect it too [laughter].

insideHPC: Good to hear. Well, okay, I’m going to put you on the spot Vince because Mark’s on the phone. But are these tools ready for prime time? Are they ready for these non-expert scientist types?

Vincent Betro: Yes. That’s the really big thing that makes me go out and say, “Now is the time to strike.” And it’s the fact that a lot of the issues that have been present in the past whether that be able to run on different systems, whether it’s functional for GPUs, and mics, and all these accelerator technologies that are coming out – it’s there. The functionality is there and the interfacing is significantly improved from many different past incarnations of many different debuggers. We’ll just say it that way [laughter]. But it is significantly improving. It’s very simple to use. It’s very nice GUI, methodic command line – print this. You have to forgive me, Mark. You have to tell me the product name and the HTML.

Mark O’Connor: Oh yeah, you mean Allinea Performance Reports.

Vincent Betro: Performance reports, there you go. Perf reports or performance reports – Having that where you can basically have this service monitoring your code while it’s running in a very friendly HTML-based window. I don’t have all the great parlance for GUIs [chuckles], but it’s really, really fantastic. It does make it achievable because you don’t have to know command lines syntax and be comfortable working at a terminal. You can be somebody just you average Joe off the street that knows how to point and click, and you could get all the information you need.

Okay. So you don’t need to know LaTeX and know how to do the VI editor to output something, is that true?

Vincent Betro: No, no.

insideHPC: Okay [chuckles]. Well good, good. So along those lines then. This is the last check off. Should every grad student be using this stuff, be using these tools you think, Vince?

Vincent Betro: Yes. God, I think back to grad school, if I had a parallel debugger [chuckles]. My background, as I said earlier, was CFD and I cannot even tell you how nice it would have been to be able to drill down and be like, “Oh that’s the spot where the calculation’s failing,” without having to printout every stinking flux value [laughter]. I can’t even describe the pain that that was at times. But anyways, long and short is yes. Anybody that should be using these, it’s graduate students because sometimes it’s hard to teach an old horse new tricks. Sometimes whether it’s due to the actual oldness of the horse, the resistance level of the horse or whatever, it can take longer to teach them new tricks sometimes. But with graduate students it’s like starting good habits early. It’s like teaching your kids to wear their seat belt. Two generations ago, that was not a norm. And now, your kids will give you crap if you don’t put it on. It’s the same thing with graduate students. This is the future. This is the people that we have a chance to sort of cajole into a way of doing things that’s more efficient and less hackish, because the tools are there now that weren’t there 10, 20 years ago.

One of my favorite best practice coming back to the supercomputer code is actually random conversations with people after talks. We’ve had PhD students come and say, “Oh. Thank you for [chuckles] inviting us to debug.” “What?” It saved me months on my PhD. I was really stuck until so and so said, “Hey, give me a bit of that.” At the same time, we’ve also had people come back and say, “Oh, if I’ve had this when I was in my PHD, it would’ve saved me months.” Because – and this is true of all sides – when you try to push the boundaries in one field, you need to support all the other fields to make it possible. You can’t be fighting in all the fields at one time.

insideHPC: Yeah. What about employee ability? When these kids get out– I guess they’re PhDs. They’re not really kids anymore. But knowing parallel tools, wouldn’t that make them eligible for many more kind of career opportunity in theory, do you think?

Vincent Betro: I would say yes. In the sense that– there’s two ways I think. One is it shows the perspective employer that this person really knows how to drill down into a problem and solve it, using all the tools at their disposal, and not just guess and check. Right? For lack of a better term, guess and check. The other thing that makes it more employable is what’s occurring now – and it always has been and always will be in this field – is code base has change. People have to take legacy code and either rewrite it, or adapt it, or add CUDA or OpenACC. Pick your poison, right? Open MP4. They’ve got to go on and do that, right? When you go and modify an existing code, it is without a doubt guaranteed – I promise this will happen no matter how good you are – you will break it, and you need to know how to fix it, and the way you’re going to fix it is with a debugger. And so being able to say, “Hey I can go into somebody’s code and using this tool figure out what the heck they’re doing and how to make it better.” I mean who wouldn’t want to employ that person, right?

insideHPC: Yeah, I mean if you look at a place like Oak Ridge where you know they’re changing architectures every five years, and they got to do years of prep work– folks like that to get things ready for the new machine, the new architecture, the latest greatest.

Vincent Betro: Exactly. And they are– I talk a great deal with the folks across the street. I mean I’m here to encourage but I work for UT, so I talk to the DOE folks all the time, and they have a very difficult time– we have all the leisure time because we can hire students, and if they don’t work out, it’s not a big deal. And they can come and go. But DOE is a little different. Right?

They’ve got to hire them and then once they’re there, they’re pretty much in. And they have a really difficult time finding people that know these tools, because that’s what they need. They need people that can port. And to port, you break. And to fix what you break, you debug. I mean that’s just kind of the way of computer science. So, having skills that you can say, “I can work on other peoples code,” is like candy to a recruiter or to a potential boss.

Mark O’Connor: One of the best coders I know, he basically is just really, really good at debugging. When he starts a new project, it seems like he takes the last program he wrote and debugs it until it does what he wants and you want to do.

insideHPC: Yeah. That’s an interesting path to your endgame. All right, guys. Well, I want to wrap this up here. And so, should the call to action – come to the XSEDE thing and come to this training session because you could be missing out on some great stuff. Something like that?

Vincent Betro: Definitely. I would say come to the XSEDE conference in Saint Louis and there will be a training session as well as tons of other folks in your field, and this is the place. So come on down and learn about debugging, and learn how we can take this federation of systems and really make it sing.

insideHPC: Okay. And I hear there’s good food in Saint Louis as well, so just as an added bonus.

Vincent Betro: That’s true. There’s good food and there’s actually some really neat stuff to do downtown.

insideHPC: Yeah, yeah. Okay, so that’s fair enough. Okay folks, that’s it for the Rich Report. Stay tuned for more news and information on high-performance computing.

Sign up for our insideHPC Newsletter.

Resource Links: