Analytics for Massive Datasets Using High Density GPU Accelerators

Print Friendly, PDF & Email
PCIe Gen 4

In this week’s Sponsored Post, Katie Rivera, of One Stop Systems, explains the technology behind analytics for massive datasets using high density GPU accelerators. 

Analyzing big data with GPUs is nothing new. GPUs have been the “sweet spot for Big Data” for several years. What is relatively new is GPU cloud computing. Less than a year ago AWS announced that they would be implementing Elastic GPU for EC2, a new product that allows users to utilize as much GPU power as they need. And over the last year, other cloud computing providers have followed suit and now users can rent GPU power in the cloud for big data analytics instead of building or buying their own hardware.

Katie Rivera, One Stop Systems

Katie Rivera, Marketing Communications Manager, One Stop Systems

GPU cloud computing couldn’t have come at a better time. In 2013, an estimated 90% of all of the data in the world had been generated in the previous two years. The amount of data grows rapidly every year and it requires analysis to gain insight and continuously propel innovation. But when datasets number in the billions, they require tremendous GPU power and specialized applications. Database and visualization applications such as MapD’s Core Database allows for visualization of massive amounts of data in real time without lag. MapD’s database “intelligently partitions, compresses and caches data across all GPUs, providing users with up to 100x faster database queries.” MapD allows users to analyze massive datasets much faster than with CPU clusters.

[clickToTweet tweet=”Katie Rivera – GPU cloud computing couldn’t have come at a better time.” quote=”Katie Rivera – GPU cloud computing couldn’t have come at a better time.”]

In late April 2017, MapD announced their software version 3.0 which scales across multiple servers. At GTC 2017 MapD teamed up with SkyScale, a GPU cloud computing provider, to demonstrate the latest version of MapD’s distributed software on a collection of each tier of SkyScale’s current platforms to demonstrate a real time analysis of geospatial data. SkyScale clustered three One Stop Systems’ nodes for the demo: sixteen P100 PCIe GPUs in the 3U HDCA, eight P100 SXM2 GPUs in the OSS-PASCAL8 and four P100 SXM2 GPUs in the OSS-PASCAL4. The P100s were connected by InfiniBand with a super-fast connection speed of 56Gbps. The live demo ran on a total of 28 P100 GPUs to allow users to explore in real time 11 billion rows of shipping data that had been collected over 7 years. During the demo, the user zoomed in over New Orleans after the Deepwater Horizon oil spill in 2010 and filtered the data to show movement of the anti-pollution vehicles. The data was rendered rapidly with an incredibly responsive experience each time the data was filtered.

 Database and visualization applications such as MapD’s Core Database allows for visualization of massive amounts of data in real time without lag.

The live demo showcased SkyScale’s ability to cluster nodes, but each platform is available to lease individually as well. The MapD demo shows one example of the type of big data analytics users can process with SkyScale’s cloud computing power. SkyScale constantly upgrades GPU density, storage capacity and network capabilities to provide customers the fastest access to run their applications. SkyScale’s systems are ideally suited for applications such as machine learning and predictive analytics across various industries including retail, life science, oil and gas, finance, and many more. All of SkyScale’s compute nodes utilize the latest, state-of-the art NVIDIA Tesla P100 high density GPU accelerators supporting up to 16 per node working in parallel on a single root complex. These nodes are completely dedicated to each user’s application.  Some vendors provide P100s and some offer up to 8 but no one but SkyScale offers 16 P100s.  In addition, Skyscale guarantees maximum peer to peer bandwidth between all 16 P100s by avoiding the need for inter-GPU traffic to flow over the QPI bus between CPUs. The nodes can scale out through optional Infiniband clustering to meet any performance requirement.

This guest article was submitted by Katie Rivera, marketing communications manager at One Stop Systems.