Transferring Data into Cloud Storage Fast and at Scale

Print Friendly, PDF & Email
Rob Futrick, CTO, Cycle Computing

Rob Futrick, CTO, Cycle Computing

Over at the Cycle Computing Blog, Rob Futrick writes that the company is tackling one of the toughest challenges for Cloud HPC — transferring data into cloud storage fast and at scale. The Solution? File and file system parallel uploading using the company’s DataMan software.

Back in 2011, we benchmarked parallel uploading to speed up transfers to Amazon S3. We found parallelizing the transfers of parts of individual files, as well as transferring entire files concurrently, maximized bandwidth usage into Amazon S3. Cycle’s DataMan data workflow software has done this out of the box since 2013. The intelligent parallelism built into DataMan enables it to handle data workflows with truly massive scale into and out of the Cloud: a billion data blobs, Petabytes (PB) of data, and distributed file systems, among other production use cases.

Fast-Parallel-Transfer-Pipeline-into-Cloud-within-DataMan

According to Futrick, DataMan divides up large files and file systems for you uses uses parallel uploads – via multiple threads or multiple instances of DataMan running at the same time – to maximize bandwidth utilization.

An online demo of DataMan is now available.

Sign up for our insideHPC Newsletter.