Amazon tackles cloud data locality problem with the US postal service

Print Friendly, PDF & Email

Werner Vogels, Amazon’s CTO, obviously knows there is a data locality problem with his cloud, too. At least for customers who start with large data. So this week he’s writing about Amazon’s solution: AWS Import/Export.

AWS Import/Export allows you to ship your data on one or more portable storage devices to be loaded into Amazon S3. For each portable storage device to be loaded, a manifest explains how and where to load the data, and how to map file to Amazon S3 object keys. After loading the data into Amazon S3, AWS Import/Export stores the resulting keys and MD5 Checksums in log files such that you can check whether the transfer was successful.

Um, ok. You could probably use UPS, too.

This is clearly better than nothing. This solution generally sucks, though, because I have to touch the data. With my hands. Let’s start draining people again with leeches when they are sick, too.

What we need is a radical improvement in bandwidth into and out of these clouds. Big data isn’t a problem of your average citizen, it’s a problem of big businesses and research institutions. So, build big fat networks between the clouds, and extend that network to where (most of) the customers with big data are (big cities and Bentonville, AR), and probably peer with Internet2 and the Lambda Rail as well to bring in the universities. This is the equivalent of building McDonald’s restaurants next to exit ramps on the interstates.

I guess this could be another national initiative, but frankly I’m tired of having to suckle at the federal teat every time we need something. It’s weak, lazy, and unnecesary (has anyone read Atlas Shrugged recently?). We can probably afford to form a consortium and build the thing out ourselves (cloud customers, major research institutions, whoever), and possibly even recover some costs by charging for access. After all, I use AWS and JungleDisk to back my laptop up to the cloud, and I happily pay the $1 a month or whatever for the bandwidth I use in and out and the blocks I’m tying up on their disks. And I’m just storing pictures of little Johnny. If I was moving around business critical data, and processing it to make more money, I would look at the costs of moving it around as a cost of doing business. Like going to Staples.


  1. Penelope says

    who is John Galt?