Interview: Dr. Boku Looks Forward to HPC in Asia Day at ISC’12

Print Friendly, PDF & Email

The second annual HPC in Asia conference will take place in Hamburg on June 17 in association with ISC’12. To learn more, I caught up with Dr. Taisuke Boku, Professor Department of Computer Science Graduate School of Systems and Information Engineering University of Tsukuba and the Chair of HPC in Asia Steering Committee.

insideHPC: The first HPC in Asia Workshop was held as part of ISC’11 and drew 90 attendees from Asia, Europe and the U.S. As the Steering Committee Chair for the HPC in Asia Day, what would you say are the main objectives for the conference?

Dr. Taisuke BOKU: The main purpose of the workshop is to share the information on HPC activities in Asian countries with European and U.S. researchers and vendors. Last year was really great for Asian HPC because Japan’s K Computer was ranked as #1 in TOP500 list twice on June and November as the follower to Tienha-1A of China got #1 position on November 2010. These machines were very attractive topics for people around the world and it was so good opportunity to share application activities. I like to continue this movement to announce the Asian HPC activity to the world.

The second purpose is to provide a good opportunity to promote the activities between Asian countries. Of course it has been continued in many conferences within Asia but the higher stage greatly stimulates the researchers especially for young ones to collaborate within/outside Asia. In this year, we will start the poster session for any HPC research and activity in Asia in this workshop, and that is exactly for this purpose.

insideHPC: The new HA-PACS supercomputer at the University of Tsukuba is described as a “demonstration system for parallel computing with Tightly Coupled Accelerators.” What’s new and different with this generation of the PACS machine vs. its predecessor?

BOKU: Basically, the largest difference of HA-PACS from its predecessors is that this is the first PACS system introducing accelerated computing technology. So far, PACS system employed ordinary scalar processors except QCDPAX of which computation node is also equipped with an ASIC for a small vector processing feature. This time, we introduced GPGPU technology as much as we can in the meaning of very compact and dense implementation of multiple GPUs per node. As shown in the question, it is not just a large scale GPU cluster but we will develop our original technology on this machine which is called TCA (Tightly Coupled Accelerators) to enable real GPU-to-GPU communication over node which is impossible today. For this purpose, we have been developing a new chip based on FPGA to use PCI-Express as the communication link between any combination of CPU-GPU, CPU-CPU or GPU-GPU for internal/external.

insideHPC: How much of its computational capability is derived from GPUs?

BOKU: HA-PACS consists of two parts. The primary part is called “HA-PACS Base Cluster” which is now running from February 2012 only with commodity parts as a large scale compact and dense GPU cluster. It has 268 nodes and each node is equipped with two sockets of Intel E5 (SandyBridge-EP) and four of NVIDIA M2090 GPUs. Each E5 CPU has 8 cores to support AVX SIMD instruction with 2.6 GHz of frequency and achieves 166.4 GFLOPS of peak performance. Each M2090 has 512 of CUDA cores to provide 665 GFLOPS (double precision) of peak performance. In total the performance of each node is 332.8 GFLOPS (CPU) + 2660 GFLOPS (GPU) = 2993 GFLOPS in total, very close to 3 TFLOPS/node. As the system, HA-PACS Base Cluster achieves 802 TFLOPS of peak performance. Base Cluster is mainly used for the development of large scale applications and algorithms on various domain sciences in our center.

We will also develop “HA-APCS TCA System” equipped with our TCA technology for experimental platform of this new technology. It will be deployed one year later (March 2013) with approximately 64 nodes with enhanced GPUs which will provides more than 300 TFLOPS of additional performance to its Base Cluster.

In conclusion, we will have more than 1 PFLOPS of peak performance when the TCA System is added to the Base Cluster.

insideHPC: Does the Hybrid architecture give you advantages in terms of power efficiency?

BOKU: Yes, of course. If we just look at Linpack benchmark, we can estimate that the sustained performance of CPU is approximately 85% of peak while GPU achieves approximately 50% of peak. Each E5 CPU of HA-PACS consumes 115W at peak while M2090 consumes 225W. In summary, the power efficiency of CPU and GPU for Linpack is 1.45 GFLOPS/W and 1.48 GFLOPS/W, respectively, so they are very close. But on actual applications, CPUS performance will be largely degraded while we can keep relatively high performance if we carefully select the target and apply an appropriate coding. GPU, of course, is not a magical device to accelerate any application, but for some fitting applications, we can exploit a great performance supported by an array of simple computation cores and high bandwidth of GDR memory, so the sustained power efficiency will be much largely different, I believe.

insideHPC: Is the core mission of HA-PACS tied to Exascale research?

BOKU: Yes, the basic purpose of our research on HA-PACS is focusing on Exascale both on application and system. While HA-PACS with Base Cluster and TCA System achives just around 1 PFLOPS, we will develop a new algorithm and basic code toward Exascale computing based on accelerated computing technology. TCA is a system side research to enable the direct communication among accelerators to apply accelerators to large scale systems with much smaller latency than today’s technology, which will be important to achieve high performance in strong scaling. We think there will be some limit of weak scaling in Exascale era due to memory capacity shortage, computation time to solution, high probability of partial system failure, etc. So the capability for strong scaling where the latency issue is more serious than today is quite important.

Please remind that we don’t think that GPU is not the final solution for Exascale. What we are aiming on HA-PACS research is not “GPU Computing” but “Accelerated Computing.”

insideHPC: What do you think makes ISC’12 an attractive place to meet for HPC researchers?

BOKU: I think this workshop can be the hottest place for information sharing and research promotion for Asian HPC activities. So far, we have been having such an opportunity in U.S in SC series of conference, but I felt we need something more especially for Europe.

ISC is for all over the world of course, but it is a special occasion to contact with a large number of potential researchers in Europe. Through this workshop, we like to make any opportunity for sharing everything between Asia and Europe as well as U.S. on all issues on HPC technology and applications.

Registration for HPC in Asia Day is now open.