In this special guest feature, Debra Goldfarb from Intel writes that her recent panel discussion at SC16 illustrated just how fast Artificial Intelligence is advancing all around us.
Computational science has come a long way with machine learning (ML) and deep learning (DL) in just the last year. Leading centers of high-performance computing are making great strides in developing and running ML/DL workloads on their systems. Users and algorithm scientists are continuing to optimize their codes and techniques that run their algorithms, while system architects work out the challenges they still face on various system architectures. At SC16, I had the honor of hosting three of HPC’s thought leaders in a panel to get their ideas about the state of Artificial Intelligence (AI), today’s challenges with the technology, and where it’s going. My guests were Nick Nystrom from Pittsburgh Supercomputing Center (PSC), Ivan Rodero from Rutgers University, and Prabhat from NERSC at Berkeley National Laboratory. They answered both questions I put to them and from the audience.
Debra: Today, the term AI encompasses a lot of technologies and techniques. What areas of AI do you think are most relevant?
Nick: Absolutely, deep learning is leading the pack. We have a lot of researchers doing over 20 advanced projects on deep learning. But we also have people doing other aspects of AI, including speech, image, natural language processing, robotics, and other things. What they really want is an ecosystem where you can support these different kinds of AI applications.
Ivan: At Rutgers, we’re seeing different applications of big data and more particularly in machine learning. We have been using machine learning for a long time on many different applications, as well as for how we manage our resources; how we manage our systems. Now we are seeing also a convergence between many different repositories of data, and how those repositories can be managed in order to obtain better insights.
Prabhat: For me and for the NERSC user community, AI is too broad a term. I think machine learning is probably what is most relevant for us. And in machine learning, the kinds of problems people bring to us typically have to do with pattern classification, clustering, regression, and anomaly detection.
Debra: There is an interesting and observable intersection of HPC and machine learning and deep learning. Where do you think that intersection is most relevant?
Nick: I think HPC is now leading deep learning innovation. We’re seeing that in hardware, ranging from many-core processors, FPGAs, and other custom architectures; things where we’re really specializing hardware to optimize deep learning research, especially in training. Specialized hardware is going to be important, but also having the general-purpose Intel Xeon processor cores for the rest of the workload will continue to be important. Our users have some aspects of deep learning in their workloads, but other things are traditional machine learning: data mining, clustering, categorization, classification, and other parts. So being able to have those complementary resources that the HPC world has evolved to, will have ongoing importance to most of the community. Eventually we’ll see ongoing islands of very specialized hardware and services around specialized workloads. For now, at PSC, we’re using a very heterogeneous collection to let people explore all of that under one roof.
Ivan: I completely agree with the fact that HPC is driving most of these innovations. But, also, it depends on how you define HPC. If we are talking about more traditional multi-processor, many-core processors, probably HPC being the innovator is true. We are driving in these areas a lot of innovations in AI. However if we’re looking to other approaches, for example, looking to Exascale, probably there’s a different approach. How we are achieving these goals, in my opinion, is being done in a very different way.
Prabhat: Just to liven things up, I will state that the HPC community is behind the curve as far as deep learning goes. I think much of the innovation has happened in the commercial space, and it’s largely been on GPU architectures. I think we’re still catching up. Speaking for NERSC, in the Exascale regime, I think we definitely see an option to run machine learning in situ. So, while the data is in memory on these large simulations, I think one can do machine learning, and then either reduce the data or subset the data and save some results out. More conventionally, I think data science use cases—when we talk big data and we talk about hundreds of terabytes of data, of course—HPC architectures are relevant. You’re not going to be loading hundreds of terabytes on commodity architectures. When you talk about doing deep learning on hundreds of terabytes of data, I think it’s evident that HPC architectures are going to be really important in that space.
Debra: I want to poke on this. You have parallel markets evolving. You have the Cloud Service Providers (CSPs) driving a lot of innovation and there’s lots of innovation at the architectural level, and even at a workload and algorithmic level. So, what will be the relationship between the ultra-scale CSP environment and what’s happening in the university labs, etc. Do you see convergence? Do you think the market will be consistently parallel, much like it has been historically, with extreme computing?
Prabhat: This issue of convergence of HPC and big data, that’s an interesting problem. I don’t see it happening in practice right now. But I think machine/deep learning is one of those key areas where it has to happen. At NERSC, where I have to recommend software tools to our user community—users often come to us asking for Tensor Flow and Tosh and Theano and Caffe—it is really our job to make sure that these architectures work well on the unified hardware platform that we have. I think there’s a desire to converge on the software, but there are restrictions on hardware. At this point, I’m not convinced that we need purely GPUs for doing deep learning; there’s much more to do on the CPU side of things. Given that CPU-based platforms are the norm in HPC centers, getting the right software to work well on CPUs, for me, that’s where the main challenge lies.
Ivan: We have been doing a lot of investment, for example, in memory. Having nodes with more memory is helping our users to be able to work with the current tools, current software technologies, and writing the applications. Also, the fact of using GPUs is clearly something that has been enabling many users to just get hands on with AI and deep learning technologies. We have been looking at other approaches, such as how we can integrate new technologies, for example, Intel Xeon Phi processors, towards this integration—this convergence of more traditional versus these new approaches.
Nick: I think these comments are both very valid. I think the main key here is heterogeneity. So, whether one has GPUs or CPUs, there will be specializations that will be better for some types of AI. And being able to do those parts of the workflow independently of what is evolving for Exascale. Those should remain separate fundamental architectures, because they’re solving different problems. Exascale has different criteria on the energy of moving data than we are going to worry about for data science. But at the same time, we’re going to have to be able to do the training very effectively. That can then be built into inferencing that may reside in the Exascale and large applications to help with optimizations, to help avoid otherwise intractable numerical difficulties at scale, and help guide simulations. Having the training in place that can then be applied to a different kind of compute-intensive segment, where you do the inferencing in production, I think will be very productive for many application spaces.
David (audience): If you look to the future, what direction would you like to push the hardware vendors to go?
Ivan: We have expelled a lot of energy for understanding the cost of data movement. Making sure we have sufficient different memory hierarchies in the system is important. In our latest procurement, we make sure all the nodes have NVMe, so we have the ability to stage data, not just in some specific area of the machine, but everywhere, so we can run our algorithms very close to the data. We also push pretty hard the network interconnects and how we can provision resources. Right now, we are doing this with the best effort, but I think we can optimize these provisioning algorithms and techniques much more if we have the right hardware support, which right now is not available.
Nick: I agree. I think that being able to have a deep memory hierarchy—and even very soon better bandwidth and very low-latency, line-addressable memory near the processing—will be a big win for a lot of these systems. Other people are developing other things for Exascale. And depending on how those will let you make better use of memory to avoid data movement at all, to do more computing near where the data actually is in memory and just avoid that movement, will give us both energy efficiencies and computation efficiencies. I also think being able to look at different arithmetic widths, whether 16-bit or something else, will continue to be profitable because a lot of these algorithms don’t require 64 bits. 32 may not be the right answer either. And I think, having not only the hardware-relative software frameworks that let developers explore that productively—whether it’s in an FPGA environment or something else—will let the algorithms people actually test things out and bring them to production.
Prabhat: Commenting on data science in general, I think we need more balanced systems to tackle the data workload compared to an HPC-oriented system. We’ve had an obsession with FLOPs for a long time, and we need to revise that. If you think about balanced systems, where does the data reside? Does it reside on a parallel file system? Does it reside on a burst buffer? How does the data get to memory to begin with? We need faster IO subsystems. Once the data is in memory, then clearly you’ll need more memory bandwidth. As was mentioned, we could have some flexibility in the level of [floating point] precision. So, in deep learning, maybe the case will be that 32 bits will be fine, maybe we don’t need 64 bits for pattern classification, clustering, and so forth. So, where that specialization should reside is an open question. I would say more balanced systems is what’s key for data centric workloads.
Debra: Within the HPC community we tend to bifurcate our thinking. We have Exascale, extreme scale computing initiatives and then we have machine/deep learning data science. Should they be at odds? Are we a) missing an opportunity or b) poorly framing what we really need to be focused on for the next decade? Think about things like precision medicine. Are we missing the moment to redress what HPC should be?
Prabhat: Exactly how data centric workloads affect architectures are on the top of our minds. I think it’s clear that much of data science will be throughput oriented. There are certainly key capability class applications that only happen at scale, and perhaps precision medicine is one of them. But, there is definitely a throughput oriented nature to the workload. I don’t think we are missing out on a key opportunity at this point. With the prominence that deep learning has come to have, it is going to be interesting to see if CPU architectures can take that into account going forward.
Ivan: At Rutgers, we have found many use cases from medicine, and what the researchers do is not just research; they also treat patients. You see how the outcomes of these algorithms, the executions in the hardware, are useful for really immediate needs. Having quicker response is important for them. We have seen researchers trying to combine data from different sources, such as patient data with Medicare data, and be able to make better treatment for patients with Alzheimer and other diseases. Speed of solution is important here, too. Researchers in the school of business are also very interested in algorithms that require fast responses. And buffering or streaming kind of applications, for example, for detecting cyber-attacks, where you have to deal with huge amounts of data coming in at real time. We need more networks, more bandwidth in the network, shorter latencies that we don’t have yet.
Nick: We’re definitely not missing it. We’re already doing it. I work with University of Pittsburgh Medical Center and their Institute for Precision Medicine. Much of Bridges was designed to specifically serve those kinds of communities. We’re finding that with the sorts of questions we have, where we need to do both the machine learning and deep learning and the very heavy compute-intensive things, the key is to unite the different kinds of systems on one fabric and seeing the same parallel shared file systems. And once you do that, then they can interoperate. And the different groups can collaborate. They can have different parts of the workflow, doing different kinds of machine learning. For example, on the deep learning side, we have people at Carnegie Mellon University who have been doing feature detection to identify genome pathway alterations in cancers. And that’s led to publications; it’s led to successes in finding operative genes in breast cancers. I was just reading a paper on the flight over here in the journal Brain, where people at the Western Psychiatric Institute and Clinic are applying different kinds of machine learning in causal discovery and understanding what are the intrinsic differences in our brains and the connectivity between people with severe depressive disorder and bipolar disorder. Because, clinically examining them you can’t always tell what phase the patient is in. But if you look at their brain image data from FMRI you can see very distinct differences in the way regions connect. That’s where machine learning leads us to the cause and effect relationships that clinicians can then look at and use in practice.
Peter (audience): Can you comment on how the heterogeneity of systems, specialized architectures, concerns of not being able to converge platforms and workloads, and cloud might play together or not play together?
Nick: I view heterogeneity not as a failure to converge but to converge to heterogeneity as the right solution. For me it’s about understanding your workload and finding the right balance. And once you find that balance, you can do a pretty good job of accommodating what your use cases will be. I think cloud will have a lot of value to people who require elasticity. If you can keep your resource busy all the time, as we are, then running your own is cheaper. For those who want to dynamically expand for different workloads for different times of year, or what have you, then cloud will offer a lot of opportunities. It’s finding the right balance for your organization and being able to adapt resources, not provisioning it all at once. Adding where you need more capacity and maybe complementing it with cloud.
Ivan: In my opinion, the way to convergence is to have more flexibility. I know there are promises in industry, for example on how to provision resources much more flexibly, elastically, and software defined infrastructure, so if we consider these approaches, clearly we can see how they converge. We are trying to approach this challenge, and we look at the architectures. We can listen to the promises from industry and dream our dreams, but then we go to the reality we have right now and we try to catch up with the current architectures. Of course, we’re going to be working on designing our algorithms, designing our techniques looking at the dreams, but sometimes the dreams don’t come true in the same way you’re expecting. So, you have to have flexibility.
Prabhat: I think we do see cloud as being part of the ecosystem. I think there will be workloads that are suited for the cloud. I think that problems that require a terabyte of memory and a thousand cores are fine on the cloud. Maybe you don’t need an HPC system to do that. It will be interesting to consider problems that involve processing tens of terabytes of data on 10,000 cores: is that going to be more efficient on HPC platforms or the cloud? Beyond that, I think if there are problems that are hundreds of terabytes or petabytes and need to be run on 100,000 cores, there are going to be challenges in moving that amount of data to the cloud in the first place. I do see that folks at HPC centers like NERSC and PSC and so on will have to make a value proposition for why such centers are relevant and what exactly we provide better than the cloud. I think that might force us to push harder on scaling data analytics.
Debra: What’s going to be the killer app that’s going to drive machine learning and deep learning broadly in the marketplace?
Nick: Really intelligent agents. Things that interface with us. For example, what will Siri be five years from now? Things that augment human abilities. That will be the killer app.
Ivan: I think the killer app is the human. The human is the main challenge here. For example, with self-driving cars, maybe the algorithm must decide to protect the driver by hitting the pedestrian. There are ethical challenges we will have to deal with.
Prabhat: Anomaly Detection. Once we get anomaly detection working, I think we can start winning Nobel prizes! Much of modern particle physics is about finding particles that don’t ‘belong to the norm’. So, finding anomalous patterns inside big data, that’s going to be the killer app.
Debra Goldfarb is currently chief analyst and senor director of market intelligence at Intel Corporation. She previously served as an IDC group vice president, president and CEO of Tabor Communications, vice president of strategy at IBM, and senior director of strategy at Microsoft.