The computing industry has grown at an exceptional pace over the past four decades. Moore’s Law about doubling computer speeds every eighteen months will probably hold for another decade or two. After that, transistors will be about the size of atoms and thus will not shrink anymore. This is the first roadblock to continuing performance improvements indefinitely. The solution here will have to come from an alternative computing source, such as quantum mechanics. Such a source is still waiting for a breakthrough that may take decades to materialize.
A second and more immediate issue is that of power consumption (and the related issue of heat dissipation). The electric bill for large enterprise clusters may run in the tens of thousands of dollars annually. Furthermore, a lot of heat is generated from these devices, which requires sophisticated air or liquid cooling units, further adding to the costs. One active area of research at the moment is in decreasing power requirements.
A third issue is that of management. These large computing devices may require enormous amounts of time from specially trained individuals (technicians, consultants, etc.) who usually aren’t very cheap. A number of efforts, both from commercial vendors as well as open source groups, are aimed at producing software to handle installation, oversight, and other redundant functions.
The fourth issue, which happens to be the research interest of HPC Answer’s maintainer, is that of fault tolerance. Large clusters have so many components that the mean-time-between-failure is sometimes lower than the amount of time required to run the application. Various solutions to this include process management and communication schemes.
The limitations that HPC developers face include both hardware and software. Students looking for research topics would do well to investigate one of these areas.