Podcast: PortHadoop Speeds Data Movement for Science

Print Friendly, PDF & Email

The Chameleon cloud testbed system is a large-scale, reconfigurable environment for cloud computing research funded by the National Science Foundation and co-located at TACC and at the University of Chicago. Chameleon allows researchers ‘bare-metal access,’ the ability to change and adapt the supercomputer’s hardware and customize it to improve reliability, security, and performance.

In this TACC Podcast, host Jorge Salazar interviews Xian-He Sun, Distinguished Professor of Computer Science at the Illinois Institute of Technology. Computer Scientists working in his group are bridging the file system gap with a cross-platform Hadoop reader called PortHadoop, short for portable Hadoop.

What if scientists could realize their dreams with big data? On the one hand you have parallel file systems for number crunching. On the other, you have Hadoop file systems, made for cloud computing with data analytics. The problem is that one doesn’t know what the other is doing. You have to copy files from parallel to Hadoop. Doing that is so slow it can turn a supercomputer into a super slow computer. Computer scientists developed in 2015 a way for parallel and Hadoop to talk to each other. It’s a cross-platform Hadoop reader called PortHadoop, short for portable Hadoop. The scientist have since improved it, and it’s now called PortHadoop-R. It’s good enough to start work with real data in the NASA Cloud library project. The data are used for real-time forecasts of hurricanes and other natural disasters; and also for long-term climate prediction.

A supercomputer at TACC helped the researchers develop PortHadoop-R. The system is called Chameleon, a cloud testbed funded by the National Science Foundation. Chameleon is a large-scale, reconfigurable environment for cloud computing research co-located at the Texas Advanced Computing Center and also at the University of Chicago.

Chameleon allows researchers ‘bare-metal access,’ the ability to change and adapt the supercomputer’s hardware and customize it to improve reliability, security, and performance.

We tested our PortHadoop-R strategy on Chameleon. In fact, the speedup is 15 times faster,” said Xian-He Sun. “It’s quite amazing.”

Sun’s PortHadoop research was funded by the National Science Foundation and the NASA Advanced Information Systems Technology Program (AIST).

Read the Full Story * Download the MP3 * Sign up for our insideHPC Newsletter