AWS and Start-up TidalScale: Scaling Up in the Cloud

Amazon Web Services said in a blog post today that in partnership with start-up TidalScale it has taken on the problem of scaling up on the AWS cloud platform. The result of joint work between the two companies means AWS users “can now aggregate the CPUs, memory, network, interrupts, and storage of multiple AWS bare metal instances into a single system image capable of running unmodified operating systems, middleware and applications.

“Each bare metal instance was designed by AWS with an optimal ratio of CPU performance to memory bandwidth, both of which are virtualized to form a software-defined server from several bare metal instances,” AWS said in its blog.

Of course, horizontal scaling out of compute and applications on cloud platforms has been done almost since the dawn of cloud computing. But AWS and TidalScale, which announced their partnership last April, posed the question: What if, via a software-defined instance, there was a way to scale up by aggregating multiple scale out instances?  “Then we could avoid refactoring legacy single system applications to scale out, yet enjoy the pay-as-you-go pricing, elastic growth and managed infrastructure of the cloud.”

To explain, AWS bloggers Randy Seamans (storage specialist and AWS advocate) and Justin Stanley (an AWS principal solutions architect) said that in AWS Elastic Compute Cloud (EC2), metal instances supporting the Elastic Fabric Adapter (EFA) and leveraging a cluster placement group “can form a low-latency cluster interconnect between instances making it possible to ‘scale up’ to a single system image from a number of metal instances today.”

TidalScale vertically scales up horizontally-scaled AWS metal instances by virtualizing processors, IO and memory to create “Scalable Coherent Shared Memory.” The bloggers, citing a 2018 IEEE paper by Gordon Bell and Ike Nassi, said “efficient, scalable, coherent distributed memory has long been sought after,” explaining it can now be accomplished “as a software layer on top of existing memory and NUMA virtualization,” enabled and managed by TidalScale WaveRunner.

Key to the TidalScale solution is a bare-metal hypervisor, a software-based solution called the hyperkernel that leverages Intel’s VT Virtualization Technology on modern processors to virtualize the physical resources of an instance. The hyperkernel runs on each metal instance (worker node) “to aggregate the memory, CPUs, and I/O of multiple AWS metal instances to create a single software-defined instance,” the authors said.

“Bare-metal instances may be added or removed as the workload demands, increasing or decreasing the overall available resources of the software-defined instance and thus ensuring appropriate sizing,” Seamans and Stanley said. “TidalScale can eliminate the need for costly and complex sizing, migration, and procurement exercises giving users the flexibility to resize server capacity at any time based on current workload demand.”

Communications between worker nodes running the hyperkernel is handled via a high-speed EFA network and presents a unified view of the physical resources, from each metal instance to the operating system and application stack. “From the perspective of the operating system, it is running on a single large instance that sees the memory, CPU, and I/O capabilities as the sum of the individual server resources composing the software-defined instance,” according to Seamans and Stanley.

TidalScale was started in 2013 by original CEO and now CTO Ike Nassi (formerly with SAP), lead software engineer Kleoni Ioannidou and chief engineer Michael Berman, according to an article in Blocks & Files.  The company attracted $3 million in seed funding in 2013 and an $8 million A-round the next year, led by Bain Capital, according to the story, followed by additional investments totaling more than $30 million.