User Agency Panel Discussion on the NSCI Initiative

In this video from the 2015 HPC User Forum in Broomfield, Bob Sorenson from IDC moderates a User Agency panel discussion on the NSCI initiative.

Panelists:

Bill Kramer, NCSA
Rob Leland, Sandia National Laboratory
Bert Still, LLNL
Nathan Baker, PNNL
Piyush Mehrotra, NASA Advanced Supercomputing Division
Randy Bryant, OSTP
Irene Qualters, National Science Foundation
Doug Kothe, Oak Ridge National Laboratory
Will Koella, Department of Defense

Transcript:

Bob Sorensen: We’ve managed to pull together a number of players in the NSCI who are going to give a– we’re going to go through some quick introductions and then a quick overview. Each of them is going to give their perspective of how the NSCI relates to their particular organization, and then we’ll open the floor up for hopefully a lively question and answer set. We have a very, I think, diverse and interesting collection of panelists here today, but I’m going to draw attention just quickly – forgive me, the rest of the panelists – to Rob Leland simply because he’s too modest to probably mention the fact that he is really the father of this entire effort. I think it was probably two years ago where Rob really started building the consensus throughout the entire government to make this happen, make sure it was inclusive and comprehensive, and really went through a lot of dark days where he said, “Well, we’re going to get this done by next week,” and we heard that a lot of times. So, I’m really particularly pleased that Rob is here because, for good or bad, this is primarily your fault [laughter], so I just wanted to make sure everybody knew that. So, we’ll just start now with our first panelist, which is Bert Still.

Bert Still (LLNL): Of course we have problems that we need to solve better than the ones we do today. We know there are things that are missing. We know that things are changing, evolving over time, which requires us to have better, higher fidelity models than we have now, and that means that we need more computing to be able to compute those models. So, for us, even as far back as the beginning of my career at the late 80s, the problem has never been a case of, well, exactly how much computing will it take to get you there, because the problem is the world doesn’t stop changing. So, understanding extrapolation, which is really where we’re at now, is a very, very difficult process. If you think about climate and weather, you can’t really say, “Oh, we know what the weather’s like today, so we can predict what it’ll be like three months from now.” On the other side of the coin, you can look at the Fukushima disaster and the cloud that came out over the reactor, and you could say something about how long it would take to get across the Pacific Ocean and what dissipation it would have by the time it reached the West Coast in the US. So, there are some problems you can address with certainty and there are other ones that we really don’t know how to address yet.

So, we have large problems that need to be solved and they require both models, as well as computing beyond today’s reach. That really is what’s driving our interest in Exascale. Many of you may remember that we had a program, actually I guess just about 20 years ago, called ASCI, which was the Accelerated Scientific Computing Initiative. In the time frame of President Bush when he announced the cessation of testing, we knew we had to do something different to figure out what we would do to maintain a safe, secure, and reliable stockpile. President Clinton launched the Stockpile Stewardship Program and the birth of ASCI came after that, with Gil Weigand, who was the first ASCI Director. The basic gist was to embark on this path of taking commodity technology, at the time, systems into pushing the frontier of high performance computing, to build systems that would actually give us something substantially beyond the computing capability of the day. So, from that point, we began working with vendors, with academia, with even other countries. We have a very tight, cooperative relationship, for example, with the British on how the algorithms and the models, and the technology would change, and how we would actually take advantage of it. With us, we have these really large integrated simulation codes. So, you’ve heard the acronym I.C. used for ‘Intelligence Community’, we use it actually in a slightly different way. It means Integrated Code. So, if I slip, there’s a translation issue. But the gist is, our integrated codes are multi-million lines, and they were across the three labs within the NSCI complex. It’s kind of a six billion dollar investment that’s been put into these codes. We can’t re-write them overnight.

We can’t take advantage of each individual architecture that shows up, because there are decadal-type codes to be able to make the full modifications to re-engineer one and then re-validate it. So, performance portability is an absolute key. You all have seen that usable statement inside the NSCI, and we are all about trying to figure out how to make usable machines. That is a key critical component as far, as we’re concerned. But the thing that I think we’re really seeing, we talked about the fact that a single thread performance is not increasing, and so what we’re doing is we’re simply increasing the parallelism and then the physics limitations, if you will, of how you cool and distribute power among the parts that are there. That really is leading to a paradigm shift from something that’s based on how fast you can crunch the numbers to how fast you can feed the chips with data. It’s really that paradigm shift, I think, more than anything else that’s really going to change the way that we have to do our computing. So, I think there’s a lot of lessons learned. There’s a lot of room for collaboration and cooperation among the agencies, and we’re really excited about engaging with pretty much everyone on exactly how we do that. Thank you.

Piyush Mehrotra (NASA Ames): Hi. My name is Piyush Mehrotra. I’m NASA Ames, the Supercomputing Division there. We host the premiere super computer for all of NASA. As you know, NASA is considered a deployment agency, so in some sense, we will continue doing what we’ve been doing, trying to provide a productive environment for our users. But we would like to participate in this whole process to be able to make sure that NASA admission requirements are part of that whole co-design process of looking at new technologies and the whole software snag starting from the operating system up. So, that the user productivity, like we just heard, is key that we need to get to. So, obviously, we are very interested in large scale applications and a variety of domains, starting from aeronautics design, aerospace vehicles to earth science and astrophysics and planetary science. But in conjunction with objective two, we are also very interested in large scale data analytics and the convergence of data analytics with the HBC. As you know, we have a lot of satellites out there and we’re bringing out petabytes of data every year, just streaming down this observational data. How do we handle that data? How do we manage that data? Then how do we actually extract any insight and knowledge out of that? In conjunction with that, actually, not just observational data, we also have a lot of model data that’s coming out. Chris Hill was talking about MITgcm yesterday that he’s been running on three days and the simulation from MITgcm. I think he’s produced about 3 1/2 petabytes of model data, simulation data. Now, that has to be correlated with the observational data, and we need to try and figure out how to extract patterns from it and how to extract knowledge, and somebody had said yesterday how to extract insight from all of that.

We’re also very concerned about how to bring the two environments together, so that we can do large scale simulations along with large scale data analytics on those. Obviously, being a department agency, we will be looking at technologies that are being proposed through the NSCI process, and we’ll be able to hopefully test those technologies on as applications and then leverage them for our users out there. But Andy was talking about future computing, and that’s one of the things that we are actually very heavily involved in – quantum computer. For those of you who don’t know, we do have a quantum. Any link device down in my basement that NASA Ames, one of the bigger machines that you have out there. Now, as somebody said, this is really a baby machine, a new initiative but we’re looking at how we consult, particularly in this case, common optimization problems on this. Then later on figure out whether this is useful in the HPC arena. It’s not necessarily as a general computing HPC machine, at least as a co-machine where part of the optimization is being done on this machine, whereas the generic HPC machines are used as the solvers, whereas the optimization is done on this machine. So, we’re also looking– as a division, we are– one of our charters is to be the smart buyer for the agency. So, we’re always looking at new technologies and we hope to sort of look at the technologies that are being proposed through the NSCI process, and we’re part of that whole process all the way through. Thanks.

Bill Kramer (NCSA): Hi, I’m Bill Kramer. I’m from NCSA, and involved with the Blue Waters project there. So, we’re very excited about the NSCI, and we’re thinking it has a great potential for continuing all the progress that we’ve made to date and actually expanding that dramatically. One of the reasons is because we’ve seen that computing and analysis resources, at both high scale and broad scale, are really universal instruments. So, at the same time, where communities may have a telescope going in or a satellite to do their observations, our computing infrastructure and our data analysis infrastructure is really universal. So, at the same time, if somebody is using Blue Waters to be the world’s most powerful telescope to see beyond what can be seen even with the Hubble Telescope and understand it, we’re also having people use it as the world’s most powerful microscope to see maybe ten or twelve times higher resolution of atomic processes and molecular processes – so, that’s great.

My background is I spent significant time at NASA, and then significant time in a DOE complex, and now in NSF. One of the things that strikes me in the NSF environment is by far the broadest and most diverse set of use cases for computing, and high-end computing and analysis that I have been involved. Many of those do not overlap with things that we hear going on in other places. So, from the point of view of the initiative, having the co-leadership and the broadest that NSF brings is very exciting, because I think it will actually enable us to have much broader impacts than maybe would have occurred without that. We have been working with our science teams and engineering teams, and developing not only helping them use the system as it is today – which is a very unique resource – but also talking in trying to define what could be done with future generations of activities. There are certain characteristics that we found that we summarize these things in. One is increased range of uses and needs, anywhere from – as was mentioned earlier – cancer all the way through whatever the next space telescopes might be able to be seen. But other things are dramatic increased in fidelity in models and also the analysis. Fidelity is all the things we talk about higher resolution for climate, more particles for certain types of simulations, more precise measurements. So, we see a great increase in fidelity, which has been driving our need for more computing over time and more data, but those insights address new problems. Another thing is longer simulation periods.

So, even though Blue Waters and the other leadership class systems are doing things that had not been able to be done before, we have teams that are making strong compromises. So, for example, space weather right now, a full space weather calculation which is 15 orders of magnitude to address what happens with a solar flare and how that impacts life on Earth. They can basically do one tenth of the time period of such an event. They gained great insight because they couldn’t do that at all before, but to really understand an event like that, you need to do ten times more just in time-space, and many problems are like this. How long you have simulated time as well as the details? There is an increasing number of problems to address, we mentioned. They come in two forms. Obviously, new areas that need analytical resources or computing resources, but also what happens is when you have a frontier implementation of a problem, say, you go to 100 million or 200 million atoms. All of a sudden, a number of other researchers and teams say, “Oh, I have a problem that needs that resource as well.” So, these best-of-breed or frontier type things then generate a much wider range of problems that come in. We use ensembles sometimes to say that is or multiple cases. But we have to remember that these are ensembles not in the case of very small scale ensembles, but very large ensembles that need to be run for validation. As we go through and see people producing results, there are much more requirements for doing that. Changing work methods is very important.

So, we focus on productivity. By that, we mean time to insight. We don’t focus on particular rates, as you know. What we are seeing is while it’s very important to be very efficient at the large scale – part of the problem – because that may take 90% of the computing resource to do the large scale calculation. The elapsed time for a team to understand what happens is only in a month, and most of that is dealing with the data that’s produced or the data that they have to ingest and stimulate and moving that data around and then many, many steps to analyze that. So, for example, you may have tens or hundreds of thousands, of course, applied for a while to do a simulation, but then you may also need to have a hundred million jobs to analyze that. What we’ve seen is that making those teams productive really means their entire workflow, not just the plot that’s highly parallel and runs on the bigger system. So, there’s a lot of work that is potentially very productive in looking at workflow methods. Integration of convergence of data sources, as well as stimulation or modeling resources is part of the initiative, and that’s extremely important. We see that in almost all domains of research, engineering, and science where you cannot do one without the other. Well, it used to be you model, then somebody else experiments, and then there’s a validation. These are much more tightly coupled in all domains than they were even ten years ago. That means that the systems we will be producing have to be able to accommodate that essentially simultaneously, and the implications are much more on the software side than the hardware side. So, layers of – somebody mentioned the layers of hierarchy that we’re going to have to deal with – is a tremendous problem for the applications space, but also for the systems side of space in terms of doing that. How our systems need much more flexibility, because not only do you have these different methods in workflows, but you also have different cultures that are going to converge on a set of systems, and they’re going to need to use the same type of systems to be productive. So, we see a tremendous opportunity for layers of software, not just in the application side of how do I make use for my application domain for that, but also for how you manage systems, how you run the work on the systems, bringing together the different methods and models that have about for a very large scale activities and both the data analysis and the data science realm and in that modeling and computation realm.

As I said, it’s much more tightly coupled and we’re saying teams that now are collaborating much more tightly with the observation of teams. Examples of lodge instruments are easy to point out. The LIGO experiment is developed a very tight relationship with the people that model black holes for gravitational waves, and actually there’s things that in that experiment, they won’t be able to tell without them going and stimulating is the signal that we see that would have a certain say a non-parallel spinning collapse of a black hole. These are occurring all over the place, in terms of not being able to distinguish one versus the other, and the teams are realizing that. So, it’s very challenging for what we have to do in the future, but I think there is a tremendous opportunity to have a very broad impacts. I will say that the other thing is, Irene talked about, work force development. So, more than 65% of the use of Blue Waters is by people that are in early parts of their career, either graduate students or post docs. It’s not the most senior people. It’s very important that we continue that to bring in more and more types of that, so people are learning at not just small scale, because as they develop their methods they learn how to do things small scale, and they also learn how to do things and enable them to do things at very large scale, early in people’s careers, to have the impact.

The last thing I’ll say is we’re very engaged in how many partners that are in the industrial side, and we’re very hopeful that the NSCI will– actually, when talks about industrial benefits, it’s not to the vendors of technology – there will be certainly benefits there – but it’s actually to the other commercial industrial partners that we see. We discovered in trying to work with these teams that you can do the very first part of a demonstration example of what might be possible for them, either by scaling something up or increase in their flow to times the insight. Then there’s putting it on a production floor as they use it in day-to-day practice. But there’s a gap, that I don’t think I realize, in terms of what it takes to get a company to go from, “Yeah, I know I could do that if I had a resource” to “I’m actually using that and it’s a change of work methods.” That gap is actually an awful lot of computing, an awful lot of data analysis has to be done. It’s not the first of time it’s done, and it’s not there in their normal business practices, but to change the culture, to change the business practices of many of the people that potentially – or currently – use high performance data in computing resources. We have to figure out how to enable them to meet that gap, because they’re not going to do it until they know that they can actually improve their productivity, their product line development, and their products. So, that’s a gap that hopefully we can also address in this initiative.

Nathan Baker (PNNL): My name’s Nathan Baker. I’m from Pacific Northwest National Lab. Prior to that, I was a professor at Washington University School of Medicine. So, my comments don’t reflect either institution’s opinions, of course, but they will reflect my experience of both, because I think there’s some really interesting cross domain opportunities here, as was pointed out in some of the morning sessions. So, I come from the applied math, physical chemistry, and computational biology perspective, and these are areas that have been struggling with the data problems, as well the computing problems for quite some time, and have really been looking for long term enduring solutions. The problem of being at the end of the queue, on this table, is a lot of what I was going to say has been said, so I think I’ll just focus on a handful of the objectives, talk a little bit about what some of the application pull is in the spaces, and then what I see as some of the promising technologies there. So, probably the most obvious, especially from the DOE perspective is the delivery of an exascale system. There’s a number of applications that could benefit, whether it’s the idea of computational microscopes in observatories, all the way down to just developing better models for thinking about integrated systems, how do our power systems integrate into climate, how do they integrate into other critical infrastructures. These are necessarily computationally stiff models and that some parts of the model might need to run for decades while other parts of the model are at the level of user demand on a power grid.

These are hard problems. They’re both data intensive and compute intensive. PNNL has been working on these from a variety of standpoints, but some of the issues we’ve been focused on are a little bit different than what have been described so far. So, one of the challenges, power was mentioned, and power is always going to be an issue. But thinking about being able to model a system upfront, if we get one of these big exascale systems, how do we remove the burden from the user to think about up-time, down-time, deployment, integration across processes, fault tolerance, et cetera, and actually integrate that into our programming models. That’s an area where we’ve really spent a lot of time thinking about how we’re going to address these systems. A second area is simply thinking about the algorithms differently. A lot of the work at PNNL on computing has been focused on the data intensive space. Many of the algorithmic advances have been focused on what do we do when we don’t have the resources that are necessary, and so need is a great driver for innovation. Thinking about approximate computing, thinking about anytime algorithms, thinking about ways to get resources to do a good enough job given the compute or the storage that is available, I think it’s an important element to this overall initiative that’ll need to be explored. We’re going to have to think about our algorithms differently because there will never necessarily be enough computing or enough storage to tackle the problems that we want to tackle. There’s going to be a push, I think, from the math side, a need to address these problems differently.

As I mentioned, the objective #2 really resonated with us because we’ve been focused on this data intensive, data science, data analytic piece for quite some time. This comes from a variety of directions. NIH saw this need a long time ago. National Cancer Center and a variety of others recognized the data was growing at a rate that getting it into the hands of a practitioner for decision-making was becoming impractical. This pops up in other domains and the security domain, whether you think of Intel analysts or power analysts who are actually thinking about critical infrastructure. We have too much data, and there’s a big gap between the data and knowledge. Some of the programs at PNNL that have motivated our concern and our work in this area, first is high-energy physics – the amount of data that comes off of big instrument as what’s mentioned earlier whether it’s LIGO or one of the big detectors – is too high of a bandwidth to even ride out to a bus, much less a disk. So, the problem is that you’ve got a baby in bath water conundrum. You’re spending millions, billions of dollars looking for rare particles and yet the data is coming out at a rate that you may have to triage and you may lose what you’re looking for. How do you design robust algorithms that can handle that? How do you design algorithms that can detect what you need to detect? Although you’d love to keep the data, which you have to triage.

Another area where this pops out a lot, and was mention briefly before, is Next Generation Imaging. So, there’s a big driving force from NIH, from basic energy sciences and DOE to develop better instrumentation so we can get down to finer link scales, and to add dynamic, time dependent information to that instrumentation. At that point, we can’t process the data coming out. It’s very analogous to the DOD problem where the DHS problem of too many cameras, too many sensors. How do we start to pull these things together? How do we push computing to the edge with these instruments so that we’re really transmitting knowledge and information more than raw data in this scenario? There is many other examples, atmospheric monitoring, and many problems in NIH. I think in the interest of time and because many of these things have been discussed already, I’ll skip some of those.

I want to talk a little bit about objective three. This is a scenario that I was personally very excited in, and that I think there’s a really unique role for collaboration across the labs, across DOE and NSF. Basic energy sciences in DOE, NSF, and others have contributed a tremendous amount of investment to the material science world. A lot of it has been focused on CMOS technology, but there’s tremendous opportunities out there to start taking the very advanced characterization capabilities. The ability to place atoms where they are needed through resources PNNL, Los Alamos, Sandia, and start to ask – What are the next generation problems in fabrication, how do we couple those to characterization, and then how do we think about modeling and algorithms differently? Because for these problems, at these link scales, perfect is going to be the enemy of progress. We’re going to have to be thinking about computing in the presence of noise, in the presence of imperfect materials. What’s the framework that allows us to deal with that? I think I will skip objective four and five. A lot has been said about that already– if we build computers and algorithms that nobody can use, then we haven’t really had much of an impact. So, we’re also very invested in solving some of those problems.

Rob Leland (Sandia National Labs): Thank you Nathan. I’m Rob Leland, from Sandia National Laboratories. I wanted to thank Bob for that very gracious introduction. To return the favor, Bob was part of our council, as I spent a year at OSTP working on this starting about two years ago. Bob was a member of the council, as was Irene. Randy was, of course, a critical part of the team. Bob, I always appreciated the depth of your insight and the sense of conscience you brought to bringing us back to relevance and broad relevance to society. If you don’t know, Bob had a previous life in government, so I wanted to thank all my colleagues up here. I know much of the content that we worked through was influenced by Burt and Bill and many people in the audience, and so it’s really been a community effort. I’m delighted that it’s in such good hands now with Randy still on board and, well, being part the executive effort. Doug, you and I have worked together in the past, and much of that inspired– much of the context here was inspired by some of those interactions. So, it really has been a community effort, and that’s been quite important. The second thing I should say is I’m just going to speak from a Sandia perspective, if you like, not DOE or OSTP perspective.

But what I think I can probably most contribute here is just a brief historical view on the initiative, and I wanted to do that just by responding to some questions you had sent in advance, Bob. The first was, what are your insights on how the initiative will likely unfold? Basically, I’d say, I think, it’s going to go pretty well, because it’s carefully designed and we have a lot of input in the development of the initiative. In particular, there’s a good sense, a broad sense of ownership and cooperation across the agencies that will be quite important. As what’s touch down earlier, devolution, if you like, is the risk. Of course, there’s some very substantial technical risk here. But I feel very confident that if the US government brings forth its best effort in partnership with industry and academia that we can do these things. I think the main risk is that it might unravel a bit, and we’ve built in a number of structures that will help quite a bit with that. There is for example a joint road map that the agencies worked on that is fairly detailed, and I think will likely be reflected in some form in the implementation plan, where the agencies agreed to the vision or strategy, goals and objective, rules and responsibilities in a fair amount of detail.

So, there actually has been a lot of work done to get to the next level of thinking in advance. The second question that you sent, Bob, was, what do you think of this potential impact of for-government sponsored HPC research and the broader HPC sector? I guess what I’d like to say there is the overarching goal of the initiative to assure continued US leadership in high-performance computing. Of course, many things contribute to that in this ecosystem sense. But the government has historically played a very vital role in particular in sponsoring forward looking research and development. The initiative creates the conditions for that to continue going forward. I think the reason excellent analog, as Randy mentioned in the HPCC initiative from the early 1990s that is generally viewed as quite successful. I think we can hope to replicate that success here.

The third question was, how do you think this will help the wide range of US Industrial HPC users and boost the overall competitive position of US industry? What I would offer there is, if you look at the history, I think each major new era in computing has been preceded by, say, five to seven years by a forward-looking investment by the government R&D push. You can trace that back at least five cycles, I would argue, and I think that can be true again here. There are many indicators that we’re sort of approaching that wall where we need to take a very substantial step up in our capability and the change in our approach, and so I think all the preconditions are here for us to replicate that history once again in a cycle. Then the last question was, do you have any additional points you want to cover? I guess what I would just say there is this is a huge opportunity for the agencies and the community to do something that’s really important for the country, and I would just ask you all to join in and making that a reality.

Bob Sorensen: Well, thank you Rob, and thank you for the panel. It’s clear that however this initiative goes forward, it won’t suffer from a lack of ideas, insights, interests and passion. So, I’m very optimistic about this project moving forward. So, with that said, I’d like to throw it open to the audience to ask questions. If you have a question for a specific panelist, shout out their name, or if you have a general question for the panel at large, go that way as well. We’re not going to get all quiet now, are we?

Question from audience: One thing for the whole program that was brought up I think was the issue that we’re not just looking at FLOP-based machines, that the machines have to take into account big data, data flow, data movement, everything else. You’ve kind of made a landmark position about LINPACK and readying LINPACK at NCSA and the top 500. Do you think there’s room for a new metric or new set of metrics to be developed as part of the program that would take in a more comprehensive view beyond FLOPS?

Bill Kramer: So, definitely think that there is room and great need for something like that and not to be dominated by a single metric. It’s not that that metric doesn’t have value. It’s that when it is involved as the only measure, and it’s put on a list that determines investment choices to be at a certain point. That’s where we run into problems. So, I don’t believe that there’s any single metric. Any single metric would have the same problems particularly if it’s– then people want to say, “I’m better than you are,” by that metric. It is something that we do want to judge our progress in. But it has to be a broad enough set of measures that is meaningful across the broad, set of what people want to use the systems for and what the challenges are, the scientific goals. So, I don’t see it as a single measure. Even a measure like APCG will end up having the same problem, at the other end of the performance. So, we need to come to a community understanding of a set of measures that are meaningful, that also then we can see the progress that we’re making. A set of measures to me would not be– yeah it’s hard to get your head around more than, say, two handfuls of measures, but I also don’t think it can only be one or two measures. So, that’s something that I hopefully can get worked on as we go forward and then maintain the correlation back with history. The other comment is I think the measures have to change. The measures that we put together today are going to have to evolve as the used cases or science cases are evolving the algorithms as they change. So, we need consistency but we also need to have more dynamic use of what the measurement should be as it evolves.

Rob Leland: We’ll, I’ll just comment briefly on that as well. If you look at the language around objective one, you see that it’s carefully phrased to say a hundred times the performance of current ten petaflop systems. The intent there was to focus on measurement in true application space. I think if we execute with intellectual integrity around that, we will have a very broad basis and it won’t be this sort of monotone LINPACK result.

Bill Kramer: One other thing, I think the– whatever the suite or a metric is, it has to directly relate and correlate to the timed insight. What is done– what everybody wants is to be able to solve something faster or to solve a bigger thing in a reasonable, feasible amount of time. We have to be able to have whatever our measure is related to that directly across the spectrum. The last comment is we have to pay because the memory hierarchies, but also because of the fact that the number of problems cannot be done without large scale memory on systems. We see that in Blue Waters where there are problems that can’t be done on any other machine because we have more memory. We have to be able to accommodate that because it’s all in the investment decision making, and if we squeeze something to get a higher number in another area, we’ll be doing this service to the science and engineering and research communities.

Bert Still: So, I just wanted to add that there’s two other measures that are out there that don’t get talked about necessarily in the HPC world as much. There’s the green 500 and there’s the graph 500. Those are, of course, both extraordinarily important in the data analytics base. As far as actually thinking about raw performance on a system, high performance LINPACK has really not been correlated with applications that we actually care about in a very long time. So, we have not, at Livermore, bought– and in fact in the AFC program, we have not bought machines based on a LINPACK result in a very long time. Instead, it’s based on understanding our workload and how our workload will perform on the machines that we’re actually after. So, this is all in complete agreement with exactly what Bill and Rob were saying.

Question from audience: So, I have a small question/comment provided at the previous. I’m glad to see that DOD, DOE, and NSF are working on NSCI. I want to know, that– and I can see NASA and NSF, and other agencies are presented there, but what’s missing is NIH. I’m wondering if there is– is it the view that almost everything that NIH funded research needs is a subset of what DOD, DOE, and NSF will work on? If not, will it not be productive to get an NIH voice added to this mix as NSCI is rolling?

Randy Bryant, OSTP: I’ll take that. Because NIH has been a very active participant in the whole planning process, and even though they are not here today, they have been involved in everything. I think NIH is still trying to understand what HPC can do for them. They have a lot of their applications I described as lots of data that don’t require very sophisticated computation. In fact, if you look at, for example, the National Cancer Institute has set up these three pilot projects of Cloud– providing Cloud services to cancer researchers. What’s interesting about them is they’ve layered on top of a Cloud infrastructure the common data set that they all make use of, which is about two-and-a-half petabytes, as well as the software and the other resources so that the cancer researchers can come right in and get to work on cancer research and not have to solve all kinds of computing problems. That seems to, for at least a lot of genomics research, maps pretty well onto more Cloud type infrastructure. On the other hand, modeling and simulation, if you will look at molecular level modeling of biological and chemical processes that’s creating areas, which mention it is being done at Blue Waters and other places like that. So, they’re still a little bit in a mode of trying to understand more completely what range of computational needs they could be taking advantage of.

Irene Qualters (NSF): One other area in our chair who’s not here, Jack Collins. He’s often attended, and his area is also cancer but it’s imaging analysis tumor, growth of tumor over time. That’s, yeah, the different problem. I think NIH does have the problem space. It doesn’t have all the solutions and, as Randy said, it’s been quite active on the group.

Doug Kothe, Oak Ridge National Laboratory: Just to add to that. A number of DOE lab personnel are actually fairly involved in the NIH projects. So, with our initial call for applications, we received probably a dozen very interesting NIH-based application ideas. So, we not just [?], DOE lab folks have been also talking to NIH, NCI in particular. So, some very interesting ideas and application requirements that are really not in our traditional scientific computing space. We’re seeing things like, “Hey, we want scalable bar, we want Apache Spark.” In terms of how to slice and dice the applications so far, we’re seeing a lot of neuroscience, both morphological reconstruction of the brain, as well as simulating the brain with neural nets. We’re seeing a lot of bioinformatics and genomics that are large data problems as well. Then, we’re seeing precision medicine apps, and certainly that’s imaging, but also simulating cancers, tumor growth, et cetera. A lot of these application spaces are different and unique.

Barry Bolding, Cray: I’d like to push the panel in the direction of pitfalls a little bit, and hear what your opinions are on things that you think could be catches in the program. I’ll throw out a provocative idea just to get it started. So, you’ve mentioned a space program a few times and how there might be corollaries. One can look at a space program. I think all of us agree it benefited the country a great deal. One can also look at it and say, “Oh, it’s been 45 years and we’re only beginning to have efficient unmanned space program missions, and we’re only beginning to private industry into the space program.” So one could say, “Well, it wasn’t very successful.” So, what are your worries about where we are in 2035 instead of 2025? Looking back at the program, and what do you not want to see, what do you worry about in terms of the program’s pitfalls? Thank you.

Randy Bryant, OSTP: I think the main pitfalls are non-technical, and one of them is in the current budget environment. It’s just hard to get federal funding, a significant uplift in it. They sort of say, flat is the new normal. That’s true across the entire budget for scientific research. So, I see that as a core problem. The Apollo program was a great program, but it consumed a significant fraction of the United States’ GDP while it was going on. It was a huge investment of resources, and I don’t anticipate that our current budget climate, that would even be possible. It helped that there was sort of an existential threat of the Soviet Union at the time that was driving us in those directions. I don’t see anything that’s going to make us sort of step up at that level. The question becomes how can we be the most effective? We can with the budget constraints that we’re living under, and I think that will be a hard road to follow.

Rob Leland: So, I said earlier, I think the main risk is devolution between the agencies. I do agree with Randy, flat is the new up. There were also a set of strategic drivers that we considered in formulating the initiative, and they point to some of the risk factors, I believe. The first was increasing foreign competition. The US used to dominate investment in this space quite dramatically. In fact, up until about 2010 or so, US investment was equal to the rest of the world combined, and then they started an inflection point so that now we’re about a third of the total investment. More worrying, I think, is the disparity in growth rates. The US growth rate in investment is 2.5%, just so the average in the rest of the world that’s engaged in this space is about 12%, and I think China’s up at about twenty three percent or so compound annual growth rate in investment. So, if that disparity persists for five or ten years, we will not dominate this space technologically the way we have previously. So, there is an important investment component. Now, the initiative addresses that to some degree, but a lot of that investment is out in private industry, and we need to see the uptake, et cetera. Second was the Moore’s Law, and I think if we don’t rally effectively as a society around that challenge, the technical path for it is very unclear. I think there’s also good indicators that were coming to the end of the MPP era. So, if we don’t make a transition to some new architectural approach, I think we will be on a path of less and less relevance, and that would be a huge adjustment in many respects in the software stack in particular. Then a last one that I’ll mention, which you may be aware of, is that the microelectronics industry is moving off-shore quite substantially. So, whereas we used to dominate the kind of roots of this ecosystem from a national perspective, that’s much less true today. That does effect our ability to command attention and to have coherence in the overall ecosystem. So, those are all risk factors which the initiative attempts to address, and those are very big forces, so we’ll see how we do.

Bert Still: So, I like all those, but I only have one more which I refer to as the elephant in the room. That’s the thing that has been discussed a lot is the STEM problem. We don’t have enough going into Science, Technology, Engineering, and Math. In particular, if I look at– as a project manager and a hiring manager, if I look at my workforce, we have a gap. Actually, in the GENx area already where we don’t have a lot of people that are still there and ready to move towards the senior manager type positions, so I’m looking as we begin to retire, the truly experienced ones that we have who fills that void. Then I looked down into the millennials and I recognized that most of them– there was a statement earlier about whether the public understands computing. I think they understand the what of computing. I have a six-month old grandson at home. The gist is he understands how to bang on the iPad and do interesting things with it. The kids understand how to turn on their phone and see what the weather is going to be today and if the Bart train’s on time. What they don’t know is don’t they don’t know how that’s computed, and the problem is that they don’t necessarily really care. One of the things that , I think, we have to do is we have to find a way– analogous to the space program, the thing that was so galvanizing about it is that it grips everybody’s imagination. Everybody that saw it was immediately captivated by, “Wow, that’s amazing. I’d like to have some mole in that somehow,” even if it’s just an interested reader of the articles published. The issue now, I think, that we have to address is how do we capture the mind-share and the imagination of the developing workforce so that we’ve got the people ready to do the jobs that we’re going to need, and will we have enough talent over the next decade to actually execute all these plans that we have in place because it will be very easy– well, very easy is a relative word here. I could envision getting funding and then not being able to spend it because we don’t have the talent to be able to execute. Then that’s my nightmare, is that they hand me money then I can’t do the program because I just don’t have the people.

Doug Kothe, Oak Ridge National Laboratory: If I can interrupt. Bert’s working with me on the application development side of ECI and that’s our number one risk. We have about 25 of them though, so there area a lot of them [chuckles]. But at least, to me, the exascale problem, it isn’t really the system. It’s what you do with the system, and it’s a science and energy and engineering output. So, we worry a lot about the people aspect. So, I would say more than half of our risks are not technical, as Rob alluded to as well.

Sign up for our insideHPC Newsletter

Sponsored Guest Articles

Kickstart Your Business to the Next Level with AI Inferencing

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA