Businesses, especially the HPC business, are in a constant cycle of destruction and creation. A market stabilizes (sometimes only briefly) and then is abruptly tilted into turmoil by some new dynamic in the customer base, or by the introduction of some new technology. HPC is certainly in a time of change right now both on the technology and business fronts as the ailing economy pushes marginal businesses over the edge into bankruptcy. At insideHPC we’ve been on the lookout for the companies that might ascend as a result of these most recent market changes.
This week we had a chance to talk with Shai Fultheim, the CEO of HPC virtualization software maker ScaleMP, about his company, its technologies, and why this particular moment of change is turning into such a boon for them.
insideHPC: Give us a little background on ScaleMP. What do you do?
Shai Fultheim: ScaleMP provides virtualization solutions for high-end computing. Its virtualization solution combines multiple x86 systems into a single virtual machine (VM) – aggregating the CPU’s, memory and I/O of all the physical machines – resulting in large memory / high-core-count virtual SMP system (think “reverse VMware”). Using software to replace custom hardware and components, ScaleMP offers a new, revolutionary computing paradigm. vSMP Foundation is a software-only solution that eliminates the need for extensive R&D or proprietary hardware components in developing high-end systems – thus reduce the overall solution cost. vSMP Foundation can be used in conjunction with cluster to reduce cluster operational expenditures.
vSMP Foundation aggregates up to 16 x86 systems to create a single system with 4 to 32 processors (128 cores) and up to 4 TB of shared memory. It is available for server offerings of Appro, Cray, Dell, HP, IBM, Intel, Sun and Supermicro.
insideHPC: With market uncertainty around solutions from the new SGI, and especially the availability of the planned next generation Xeon-based shared memory system (UV), many HPC customers are in a bind with respect to addressing their HPC requirements, particularly for large core count x86 systems with tons of memory. Is ScaleMP able to take advantage of the opportunity presented by this?
Fultheim: Yes –- and in fact, ScaleMP already has customers that have been running 128-core Xeon systems for over a year. Our virtualization solution, vSMP Foundation, provides the largest and the fastest Xeon system today with 128 cores and 4TB RAM. It will be expanded in the very near future to support 1024 cores and 64TB of memory. This virtual SMP solution is an excellent choice for any HPC user, and has outperformed existing SMP systems, while keeping price points well below these systems. I expect that price gap will remain compared to the future SGI UV or similar systems.
When you think about it, when it comes to scalable x86 systems that really address HPC requirements, our solution is the only solution in the market today. Customers should be careful when selecting solutions that are not yet available and have uncertain delivery schedules.
We continue to get inquiries from customers who had committed to SGI, but are now worried that they will not be able to meet the planned timelines, and are looking for ways out of their predicament.
insideHPC: Is ScaleMP’s ability to deliver shared memory systems for customers only a good option if SGI (or someone else) isn’t building hardware-enabled shared memory, or are there advantages such that it makes sense even if the new SGI continues with it’s Altix/UV roadmap? Is there a sense that the support that chip makers are building in for virtualization technology will deliver performance benefits for the “reverse virtualization” approach too? In other words, will future chip features to support virtualization make your solution even more viable?
Fultheim: In simple terms, “The IT world is virtualizing itself, and this direction will continue!”
Virtualization allows customers lots of freedom: our customers can choose to purchase 128 cores shared memory systems from Dell, IBM, HP, Sun, Supermicro, Cray, Appro and others. Virtualization allows increased flexibility: customers can start small by just connecting two systems to get four-socket Nehalem solution and grow over time. Customers can decide to scale only the memory of the system without having significant investment in processors. Lastly, I would say that virtualization allows customers to always be on current technologies in that supporting the latest generation of processors requires only a software change rather than an entire system change.
There are 3 components to our solution, where the performance of one of them driven is by ScaleMP and the other two driven by global IT trends:
- Intel is reacting fast to the (enterprise) virtualization market. We are seeing significant increase in the performance and feature set of Intel processors.
- The performance of high-bandwidth, low-latency interconnects, such as InfiniBand will continue to improve. InfiniBand provides better performance characteristics than majority of proprietary interconnect fabrics. This trend will continue in the future.
- Lastly, vSMP Foundation’s advanced caching technology is always-progressing, and has enabled us to win performance benchmarks against machines such as SGI Altix for the past 3 years.
These 3 trends will remain, promising that our virtualization solution will continue to deliver superior performance compared to traditional SMP systems.
Keep in mind that a significant number of our end-users are actually using the technology as a way to upgrade small to medium-size clusters to virtual SMPs. These end-users benefit from a significant simplification to their cluster infrastructure, with fewer and larger compute nodes, as well as reduction of cost resulting from the use of internal drives rather than clustered storage. Today’s SMP solutions (as well as future products, like UV) are not addressing this market segment, hence these benefits are available only by virtualization solutions.
insideHPC: What is your perspective regarding Nehalem and how do you see its impact for HPC customers and the industry?
Fultheim: Nehalem is part of a complete system architecture that has a couple of interesting promises for HPC customers.
First, it brings NUMA solutions to the mainstream x86 architecture! It means that ISVs need to plan for parallel scalability which is NUMA-aware. We have been saying for long that this is most cost-effective way to scale systems, even in the x86 space. vSMP Foundation further expands Nehalem’s NUMA system architecture with aggressive caching that improves the overall performance of the solution.
Secondly, we are expecting to see larger x86 systems in the 6 to 12 months time frame which can be excellent building blocks for even larger solutions leveraging our technology. When you aggregate 16 systems, each with 4 sockets and future capabilities of 8 cores and 16 threads – you will get lots of processing power.
Lastly, the Nehalem story is also about improved I/O and PCI-express performance. This is paramount to efficient interconnect performance, and main reason for the improved overall performance we are seeing with vSMP Foundation on Nehalem deployments. We have evidence that, for applications on a single-system level (2 sockets), Nehalem shows only modest performance improvement of 5 to 10 percent, but when leveraging vSMP Foundation to scale the solution from 1 to 8 systems (total of 16 sockets), we have seen about 30% performance improvement compared to deployments with previous generation systems. This is huge.
insideHPC: What is your perspective regarding the economic situation we are in now, and how it is impacting the buying behavior in the HPC market segment? Is the new administration’s support for science and engineering, and the stimulus package, having an impact on your business?
Fultheim: The macro economic situation has impacted the HPC segment negatively since the 4th quarter of 2008, and 2009 continues to be weak. What we are noticing is that the commercial segments have been hit harder and we are seeing many organizations trying to reduce the CAPEX and OPEX of HPC projects, which leads them to seek more cost efficient solutions. On the other hand, we are seeing that the public sector (higher-ed, government) is more resilient, primarily due to the stimulus package, which will have a positive impact on the HPC business in the second half of 2009. Many projects are in the pipeline with grants and applications for this stimulus money, which is expected to start flowing in the second half of the year.
In addition to a demand growth by shared-memory customers, we are seeing significant momentum with our offering for cluster management. Many organizations, specifically in the public sector, are seeing increased benefits in running fat-node clusters rather than traditional clusters. With the increased requirements for faster deployment, large memory jobs, and ease of management and use, more IT organizations approach us interested in deploying vSMP Foundation on their HPC clusters for management and flexibility.
insideHPC: Your product is available from Dell, HP, SUN, IBM and recently with Cray. Whats next?
Fultheim: From a vendor perspective, we had a good partnership with SGI at the past, and I am hoping that with a new management and business focus we will be able to see SGI offering our solutions to its customers again. I believe that the strong expertise of SGI in shared-memory systems, coupled with Rackable’s x86 product-line excellence and our software solution will provide customers with more choices available in scalable x86 solutions.
We are partnering with several of the Tier-1 vendors to offer more customized and integrated solutions into their product portfolio. This will be announced in the future, so stay tuned.
On the product side, we continue to focus on enhancing our product to help our customers meet their ever-increasing HPC requirements. A few upcoming enhancements worth noting are:
- In the short term, our engineers are working on enhancing vSMP Foundation Direct Connect capabilities. Direct Connect 2 (DC2) will allow connecting up to four Nehalem systems without the need for an InfiniBand switch. This would allow support for an 8-socket Nehalem system with 192 GB of RAM for under $40K. This would be a very attractive entry-level HPC solution.
- We are also working with Intel towards supporting Intel Nehalem-EX systems, and in conjunction with DC2 to allow larger shared memory systems with even more cores than available today.
- In addition, we will also be expanding vSMP Foundation to support more than 16-nodes, allowing creation of shared memory systems with up to 64 TB of RAM and 1024 cores (or more). This will address customer requirements for even larger systems than today.
insideHPC: There is a lot of buzz around cloud computing. How do you see the cloud picture emerging? Is ScaleMP involved in this space?
Fultheim: We are finally seeing cloud emerging as a serious computing alternative in the enterprise segment. Amazon’s success in this segment is proof of that. Today’s clouds are optimized for enterprise apps, and not so tuned for HPC. HPC clouds require flexibility in memory and compute capabilities, where enterprise clouds are falling short.
Here is where the virtualization “aggregation” paradigm pioneered by ScaleMP comes in. With vSMP Foundation, cloud vendors can build compute infrastructure required for HPC on the fly using standard server building blocks, and this in our humble opinion will be one of the important components of making the HPC cloud a reality. We are partnering with cloud computing providers to allow them dynamic provisioning of large memory / high core-count virtual systems.