Over at ZDNet, Liu Jiayi writes that Chinese Academician Chen Guoliang and his team have launched the country’s first domestically-developed supercomputer, the KD-90. Powered by 10 8-core Godson 3B processors, the 1 Teraflop system is about the size of a microwave oven and consumes only 900 watts.
The supercomputer KD-90 can be used for mathematics, science and engineering , military and national security, and economics. According to the experts, the KD-90 is rated among the world’s most highly advanced systems in terms of programming models and networking applications in the computer and server markets.
Are you looking for the perfect gift for the HPC folks in your life? Georgia Tech has stepped up with a Gift Guide that has something for anyone who absolutely loves technology.
Keeneland Supercomputing System. What is it? Only the most powerful GPU supercomputer dedicated to NSF scientific research. Keeneland can do circles around your local cloud cluster; it delivers sustained performance of over a quarter of a PetaFLOP (one quadrillion calculations per second). It runs on a sweet peak 615 teraflop HP Proliant SL250-based computer, pimped out with 264 nodes, each having two Intel Sandy Bridge processors and three NVIDIA M2090 GPU accelerators for blazing speed, 32 GB of host memory that provide tableside service and a Mellanox InfiniBand FDR interconnection network. All that’s missing are cup holders.
One way to replace the TOP500 list would be to put forth an alternate, like the Sustained Petascale Performance benchmark (which could give rise to the “SPP500″) that is mentioned by Kramer. You won’t be able to supplant the TOP500 right away since the new metric won’t be accepted initially.
Transactional memory is a software technique that simplifies writing concurrent programs. TM draws on concepts first developed and established in the database community, which has been dealing with concurrency for roughly 30 years. The idea is to declare a region of code as a transaction. A transaction executes and atomically commits all the results to memory (when the transaction succeeds) or aborts and cancels all the results (if the transaction fails). The key for TM is to provide the Atomicity, Consistency and Isolation qualities that make databases and SQL accessible to ordinary developers. These transactions can safely execute in parallel, which replaces existing painful and bug-prone techniques such as locks and semaphores. There is also a potential performance benefit. Locks are pessimistic and assume that the locking thread will write to the data, so the progress of other threads is blocked. Two transactions which access locked value can proceed in parallel, and a rollback only occurs if one of the transactions writes to the data.
Fans of Sun Microsystems may recall that Transactional Memory was to be included in the company’s ROCK processor, a project that was eventually shelved in 2009. Read the Full Story or check out Kanter’s recent deep dive feature on Haswell.
Over at the Thinking Out Loud blog, Adam DeConinck from R Systems writes that AWS HPC clusters are definitely useful for “bursting” loads and certain classes of problems, though they still have a few problems to solve before they can replace a “traditional” cluster.
EC2 also doesn’t get the same I/O performance you can get on bare metal. This one’s a problem for lots of people, including big web sites, and it matters in HPC too. A lot of HPC installations have big parallel filesystems that stripe over many disks, like Lustre. It’d be interesting to see what you could do running Lustre on EC2, but I think using EBS as the backing strorage would make it somewhat painful. Much nicer to use big I/O nodes attached to Infiniband. But you notice how much specialized hardware we’re talking about here? Lots of big I/O nodes, a specialized network where even IP is a second-class citizen… it all makes sense if you do HPC all the time, but if you only need to run for a few months out of the year it can seem like overkill. Especially if you are in fact running embarrassingly parallel models (and really, a whole lot of them are).
Over at Datacenter Knowledge, Intel’s Winston Saunders looks at SPECpower data and how extended efforts to increase the efficiency of servers under realistic workload scenarios has resulted in a 40% per year reduction in the energy per operation.
So to summarize, the efficiency of two-socket servers has increased dramatically. The efficiency gains result from the increased performance and energy proportionality of the systems. Transitions in transistor architecture result in large performance gains resulting in a reduction in the time necessary to complete a given workload. Transitions in system architecture also result in performance gains, but in addition lower the power consumption while the operation is being completed. This in some sense “doubles” the efficiency gains expected. This efficiency gain is easily visualized by looking at the time to complete an operation and the power used during that time.
The Mont-Blanc European project has selected the Samsung Exynos platform as the building block for powering its first integrated low power-high performance computing (HPC) prototype.
The aim of Mont-Blanc project is to design a new type of computer architecture capable of setting future global HPC standards, built from today’s energy efficient solutions used in embedded and mobile devices.
The Samsung Exynos 5 Dual is built on 32nm low-power HKMG (High-K Metal Gate), and features a dual-core 1.7GHz mobile CPU built on ARM Cortex-A15 architecture plus an integrated ARM Mali-T604 GPU for increased performance density and energy efficiency. It has been featured and market proven in consumer and mobile devices such as Samsung Chromebook and Google’s Nexus 10.
This will be the first use of an embedded mobile SoC in HPC, which enables the Mont-Blanc project to explore the challenges and benefits of deeply integrated energy-efficient processors and GPU accelerators, compared to traditional homogeneous multicore systems, and heterogeneous CPU + external GPU architectures.
The Exynos 5 Dual packs the most powerful ARM processors with a programmable GPU in a low-power mobile device that would normally be in someone’s pocket and running on a battery. Its performance density, energy efficiency, and low market price make it an extraordinary building block for prototyping a new generation of HPC systems.’ said Alex Ramirez, coordinator of the Mont-Blanc project.
During the first year of activities, Mont-Blanc has focused on deploying successfully an HPC system software stack and full-scale scientific applications on ARM platforms, proving that ARM-based architectures are feasible alternatives for HPC. Now the efforts gear towards integration of the Exynos platform on a HPC solution, and software exploitation of the embedded GPU.
Over at Datacenter Knowledge, John Rath writes that Supermicro launched new 2U and 4U/Tower platforms that maximize processing power and precisely tune hardware and firmware to provide lower latency than previous models, while still maintaining high reliability. The company debuted the systems at the High Frequency Trading World event this week in New York.
Advanced trading firms looking to reduce latency and maximize transaction flow can gain an advantage with the extreme processing power and enterprise-class server optimizations designed into Supermicro’s Hyper-Speed systems,” said Wally Liaw, Vice President of Sales, International at Supermicro. “Our latest HFT-optimized platforms boost performance of the fastest rated x86 dual processors with board-level control and circuitry enhancements and custom tailored cooling systems for the highest sustained performance. With mission critical transactions on the line, Supermicro Hyper-Speed systems ensure peak performance with maximum reliability for the most demanding computational finance applications.”
The new servers are optimized for high frequency trading and feautre premium pre-installed CPUs and memory, with storage and I/O components that are validated with a rigorous burn-in process to ensure maximum performance and reliability on deployment. Read the Full Story.
Over at Admin HPC, Douglas Eadline writes that the proliferation of manycore architectures continues to be a challenge for HPC programmers.
Recently, Intel introduced their Many Integrated Core (MIC) or Xeon Phi co-processor. Whereas the Phi lives on the PCI bus and brings more cores to the table, the design is somewhat different from a GP-GPU. The current Phi has 60 general-purpose x86 cores, each coupled with a vector processor. The Phi is not a co-processor like the GP-GPUs but rather a fully functional processing unit. In terms of software, the Phi can be programmed using standard OpenMP, OpenCL, and updated versions of Intel’s Fortran, C++, and math libraries – that is, the same tools used to program the x86 multicore processors. Data must still travel across the PCI bus, but the volume depends on how the Phi is used.
Over at The Register, Timothy Prickett Morgan writes that AMD’s new Opteron 4300 and 3300 processors have “Piledrive” cores with four new instructions and a bunch of tweaks to goose the performance of the dual-core module compared to the first-generation “Bulldozers.”
For many years, AMD has been shipping four different styles of Opterons. The plain vanilla ones run at the standard voltage and have the standard thermal profiles. The Special Editions, or SEs, run hotter and clock higher and deliver the highest performance, but they are also wickedly expensive and impossible to put into dense servers. The Highly Efficient, or HEs, are a bin sort to find chips that run at significantly lower voltages with slightly lower clock speeds compared to the standard parts, and the Extremely Efficient or EE parts run are a deep bin sort to find parts that run at even lower voltages and lower clock speeds. but which have very low thermals.
In this video from SC12, Mike Fay from Colfax International describes the company’s new CXP9000 server with up to eight Intel Xeon Phi coprocessors. After that, Vardim Karpusenko provides an overview of the company’s new training courses on optimizing code for Xeon Phi.
Thanks to our close relationship with Intel and engagement in the early testing program, Colfax is uniquely qualified and positioned to provide a complete portfolio of products to support the Intel Xeon Phi coprocessor.”