Entries filed under “Compute”

News related to the processors used in HPC gear.

Intel’s Future Haswell Processor to Feature Transactional Synchronization

Intel’s James Reinders writes that the company will be introducing new Transactional Synchronization Extensions (TSX) for the future 22 nm multicore processor code-named “Haswell”. In a nutshell, Intel TSX provides a set of instruction set extensions that allow programmers to specify regions of code for transactional synchronization.

With transactional synchronization, the hardware can determine dynamically whether threads need to serialize through lock-protected critical sections, and perform serialization only when required. This lets the processor expose and exploit concurrency that would otherwise be hidden due to dynamically unnecessary synchronization.

Read the Full Story or download the updated specifications.

Also posted in HPC, HPC Hardware, HPC Software | Leave a comment

Video: ORNL – Advancing Research and Science through Supercomputing

In this video, Richard Graham from Oak Ridge National Laboratory presents: Advancing Research and Science through Supercomputing. Recorded at the HPC Advisory Council Israel Supercomputing Conference on Feb 7, 2012 in Tel Aviv.

Presentations will soon be available from the conference site.

Also posted in Events, HPC, HPC Advisory Council Workshop, HPC Hardware, Network, Video | Leave a comment

AMD Doubles Down on Existing Opteron Server Sockets

By Timothy Prickett Morgan • Get more from this author

As El Reg anticipated earlier this week, the new upper management at AMD has come to its senses and figured out that moving to a new core and two new sockets for its Opteron line in 2012 was not a particularly good idea for its own finances, or those of the server makers who it wants to peddle Opteron-based iron. And so, that plan has been scrapped.

Instead, AMD is going to field new 32 nanometer processors based on the forthcoming “Piledriver” core design and jam them into the same G34 and C32 sockets, meaning that HP, Dell, Super Micro, IBM, Acer, and a handful of other box makers will not have to engineer new motherboards and systems.

AMD CEO Rory Read, formerly of IBM and Lenovo, spoke at the company’s analyst day in Silicon Gulch on Thursday and said that the company sees that “proprietary control points” were breaking down and that AMD was chasing “inflection points” in the PC, tablet, and server spaces. He explained AMD would bring its expertise in CPU and GPU design together to crafty system-on-chip (SoC) products that will, presumably, also integrate network and other types of I/O directly on the chip.

“Shift happens, shift is good,” Read stated emphatically, and with a straight face, adding that AMD was being tweaked to become a “market driven company” and not second fiddle in an “unhealthy duopoly.” The task Read sees ahead for AMD is “about stepping out of the shadows and leading.”

But, according to Read and Lisa Su (a semiconductor researcher at IBM and former CTO at Freescale Semiconductor who was hired back in December to be senior vice president and general manager of the new Global Business Units,) what AMD needs to do right now in servers is to step back, ramp up production of Opteron 4200 and 6200 processors and rebuild and extend relationships with server makers as it plots out its future Opteron chips.

Sticking with the existing C32 sockets for the Opteron 4200 sockets and the G34 sockets for the Opteron 6200s is just part of listening to the customer. It also gives AMD some engineering breathing time to come up with interesting, low-power Opteron platforms that are tailored specifically for hyperscale Web, big data, server virtualization, database, and similar workloads where AMD’s Opterons do well.

“Server is a great opportunity for us, and it is clear that our market share is not very high today,” conceded Su. But she also said that the “Bulldozer” core and its different architecture takes time to get its footing. Considering this, introducing new sockets right now was a bad idea technically and economically for both AMD and server makers. “At the end of the day, that wasn’t the right answer for our customers,” Su said.

Back in November 2010, two months before CEO Dirk Meyer was ousted, the plan was to crank up the Opteron 6200s to 20 cores using the new Piledriver core, an improved version of the current “Bulldozer” core used in the Opteron 4200 and 6200 server processors as well as a number of desktop chips.

The plan called for the “Sepang” processor to have up to ten Piledriver cores and plug into the C32 sockets, which are used to make servers with one or two sockets across a single memory space. The “Terramar” Opteron chip was the kicker to the current Opteron 6200 and would put two of these Sepang chips in a single package and scale it up to 20 cores per socket. Both of these chips were implemented in the 32 nanometer silicon-on-insulator (SOI) processes from fab partner GlobalFoundries.

A year later, with microservers taking off (at least in terms of marketing hype), AMDannounced that it would chase microserver builders with a new single-socket Opteron 3000 chip, code-named “Zurich,” that plugged into the AM3+ socket. The Zurich chip is a variant of the Opteron 4200 with four or eight cores activated, one HyperTransport link, and – most importantly – availability in less expensive motherboards.

The Zurich chip, presumably to be called the Opteron 3200, was expected sometime in the first half of 2012 when AMD was talking about it last fall, but it is now going to be launched in the first quarter, as you can see in the roadmap below:

AMD's Opteron server roadmapAMD’s revised Opteron server roadmap (click to enlarge)

For larger Opteron systems, AMD is taking a conservative approach. Rather than adding two more cores to the basic Opteron processor unit, the new “Seoul” processor keeps the core count at six or eight as the new Piledriver core is brought in. The DDR3 main memory stays the same – two channels per socket – as with the current Opteron 4200s, and the chips will not include any additional on-chip I/O, such as the PCI-Express 3.0 links that Intel is putting on its forthcoming “Sandy Bridge” family of Xeon E5 processors for machines with one, two, or four sockets.

The high-end “Abu Dhabi” Opterons will have 4, 8, 12, and 16 Piledriver cores, the same core count as the Opteron 6200s that started shipping last summer, and will sport the same four channels of DDR3 memory per socket.

You’ll notice that AMD is not talking about how many HyperTransport links will be on these future Piledriver-based Opterons or what speed they will run at, so it makes perfect sense to conjecture that they will run at a faster rate – 8GT/sec sounds reasonable to match the expected 25 per cent increase in raw performance that AMD was promising for Piledriver cores in desktop processors.

AMD is also expecting to put out a kicker for the Opteron 3200, dubbed “Delhi” and offering four or eight Piledriver cores.

All of the new Opterons will be etched in GlobalFoundries’ 32 nanometer processes, just like the current ones are. On the desktop processor roadmaps that Su went over, the chips for 2012 and those for 2013 were clearly marked. Not so on the server chip roadmaps, but we placed a call to AMD and were told by a spokesman that all of the chips above will be coming out this year. The Abu Dhabi and Seoul Opterons are due towards the end of the year.

The big change, according to new AMD CTO Mark Papermaster, formerly of IBM, Apple, and Cisco Systems, was that AMD was shifting from a design philosophy that focused in the performance of processor cores, adopted the bleeding edge tech from GlobalFoundries or Taiwan Semiconductor Manufacturing Corp to try to compensate for the process lag AMD (and everyone else) has with Intel.

This lead to execution problems, and more importantly, Papermaster said that the company’s current managers do not believe that the process technology node trumps integration of functions on an SoC and the “experience” that the user has using a device based on AMD silicon.

Su didn’t give out a lot of details on the future Piledriver cores, except to say that it would be able to do more instructions per cycle and would have higher clock frequencies. Many had expected for Bulldozer to do better on the clock speed front.

AMD Opteron core roadmapAMD’s Opteron core roadmap (click to enlarge)

Looking out further into the future, AMD is cooking up a third generation modular core called “Steamroller,” which would have a greater level of parallelism. This could mean a lot of different things, such as adding more threads or cores to the chip or adding more instruction units per core module. Su did not say, and it is likely that AMD is itself not quite sure what it means. And further out beyond that, AMD will crank out more performance in some unspecified way with a modular core design called “Excavator.”

It will be interesting to see what AMD integrates onto its server chips and how fast it can do it. In the meantime, Intel is going to make plenty of hay in the supercomputing market where there are workloads with heavy I/O demands because it can support PCI-Express 3.0 peripherals with the future Xeon E5 processors. It remains to be seen how much of an advantage this will be across the server market at large. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

Also posted in HPC, HPC Hardware | Leave a comment

Cray XE6m Midrange System Weighs in at $31K per Teraflop

While affordable Petascale computing may be a ways off, this week Cray rolled out the Cray XE6m system, a midrange supercomputer that brings the hyperscale technologies being deployed at BlueWaters and Titan down to the rack level. With six blades and 48 sockets using the new Opteron 6200s, the Cray XE6m starts at $200K, or approximately $30,769 per teraflops.

Building on the reliability and scalability of the Cray XE6 supercomputer and using the same proven petascale technologies, the Cray XE6m system is optimized to support scalable application workloads in the midrange high performance computing (HPC) market, where applications require between 700 and 13,000 cores of processing power.

Read the Full Story.

Also posted in HPC, HPC Hardware | Leave a comment

Video: EPFL Scientists Develop 3D Chips

EPFL scientists have developed a new generation of 3D computer chips that stacked vertically rather than placed side by side. The technology may someday enable faster, higher bandwidth processing.

EPFL scientist are among the leaders in the race to develop an industry-ready prototype of a 3D chip as well as a high-performance and reliable manufacturing method. The chip is composed of three or more processors that are stacked vertically and connected together—resulting in increased speed and multitasking, more memory and calculating power, better functionality and wireless connectivity.

Read the Full Story.

Also posted in Computing Research, HPC, HPC Hardware | Leave a comment

Video: AMD’s CTO Talks Heterogeneous Systems Architecture

In this video, AMD’s Joe Macri describes the company’s HSA architecture (formerly known as Fusion). Recorded at the 2012 DesignCon conference in Santa Clara.

The architectural path for the future is clear,” Macri declared. That path will be paved with the programming patterns established on Symmetric Multi-Processor (SMP) systems migrating to the heterogeneous world. The architecture will be open, with published specifications and an open source execution software stack, and heterogeneous cores would be able to work together seamlessly in coherent memory, with low latency dispatch and no software fault lines.

A Tip of the Hat goes to Sylvie Barak at IEEE Times for pointing us to this video.

Also posted in GPUs, HPC Hardware, Video | Leave a comment

SeaMicro Packs 64 Quad-Core Xeons into 10U

Today SeaMicro got a lot of media attention with the launch of the “first fabric-based Intel Xeon micro server,” the SeaMicro SM10000-XE. While the company has been shipping Intel Atom-based servers for a while now, this unexpected move to puts Sandy Bridge Xeons into the same highly dense form factor.

Today we have announced the lowest-power, highest-density, highest-bandwidth Intel® Xeon®–based server ever built,” says Andrew Feldman, CEO of SeaMicro. “SeaMicro now brings the benefits of micro servers—efficiency and massive density—to small and larger-core workloads and to all parts of the scale out data center. Combining the SM10000 architecture with the Samsung Green DDR3 memory and Intel® Xeon® processors, SeaMicro now sets a new bar for energy efficient compute in the datacenter.”

So how was SeaMicro able to pull this off? Rachel King writes that it was a clever combination of partner technologies:

  • Intel’s Sandy Bridge architecture and Xeon processors
  • SeaMicro’s Freedom Fabric ASIC (optimized to work with large-core and small-core CPUs, shrinks the size of the motherboard to the size of a standard business card)
  • Samsung’s energy efficient Green DDR3 RAM (half the size of a standard memory module)

Before we get you too excited about HPC for this box, it is worth noting that that the device has a shared-nothing architecture. But with the the ability to support 1024 Xeon cores in a rack, the datacenter future is looking bright for SeaMicro.

Read the Full Story.

Also posted in HPC Hardware | Leave a comment

Intel Brings Bigger Guns to AMD Server Chip War

By Timothy Prickett Morgan • Get more from this author

Analysis If you want to get into the server processor racket, here’s some advice: Don’t bring a knife to a gun fight. And when you whip out your guns, you better have a piece stashed in each of your boots, maybe another high-caliber rifle on your back, and a few knives while you are at it for price-cutting when the bullets run out.

With Intel getting ready to launch its “Sandy Bridge” Xeon E5 processors in March and revving up its 22 nanometer processes to eventually field “Ivy bridge” kickers, Advanced Micro Devices is going to have to engineer some pretty impressive new Opteron server chips. It’ll have to cook up those chips pretty sharpish, in conjunction with its wafer-baking partners, if it hopes to gain ground in the ongoing x86 server chip war – much less hold the hard-fought ground it has attained in high performance computing and server virtualization.

Everybody loves an underdog and most people like to see a bully take one on the chin and go down to his knees. So a lot of companies were rooting for AMD as it was designing the Opteron processors and trying to build an ecosystem of server vendors who would peddle machines based on them in the early and middle 2000s.

Back in the early 2000s, Intel was trying to protect its high-end 64-bit Itanium server business and push its Xeon processors down into the 32-bit volume server space, and AMD brilliantly shot the gap between the Xeon and Itanium to create the 64-bit Opterons, eventually pushing its server market share as high as 25 per cent.

But it has been a long time since x86 server chip juggernaut Intel was hammered – SledgeHammered, to be specific – by longtime rival AMD with its 64-bit, low-power, multicore Opteron processors. Intel shifted to the Core microarchitecture, added 64-bit memory addressing and processing, and a slew of key features such as the QuickPath Interconnect to its Xeon processors and hit back hard against the Opteron upstart. The “Nehalem” Xeon architecture announced in 2009 had everything that Opterons had, and when the Great Recession hit just in the wake of yet another Opteron delay, server makers put most of their effort into build Xeon war machines, not Opteron battlewagons, and AMD has been losing ground ever since.

Because server chip profits help pay the bills at Intel, AMD, IBM, Oracle, and Fujitsu, the loss of market share by AMD is one of the key reasons why CEO Dirk Meyer resigned in January 2011. In hindsight, we can also see that Meyer and the bulk of the management team that handles chip development and manufacturing have been replaced since new CEO Rory Read came aboard last July. AMD has a new CTO – Mark Papermaster, formerly of IBM, Apple, and Cisco Systems – and has replaced its former marketing, products, and operations bosses, and has tapped ex-Intel engineer Rajan Naik as senior vice president and chief strategy officer.

So, AMD is no doubt drawing up new war plans for the x86 server battlefield, but the company has not said much to date about its plans. Perhaps it will enlighten us during its Analyst Day this week. But we can conjecture about what AMD might do by looking at what Intel is about to do in the x86 racket.

A Sandy Bridge not too far

While Intel never publicly promised that the “Sandy Bridge-EP” Xeon E5 processors would launch last fall for shipments in the fourth quarter, the circumstantial evidence – and comments from motherboard and server makers like Super Micro – indicate that this was indeed the plan. But with AMD having its own issues shipping its “Interlagos” Opteron 6200 processors for two-socket and four-socket servers and its “Valencia” Opteron 4200s for single-socket and dual-socket machines, Intel did not have to rush to market. (The speculation is that a SAS controller bug similar to the one in the C200 chipset that delayed the launch of “Sandy Bridge-DT” E3 processors and various PC chips of similar design has been found in the “Patsburg” C600 chipset for the Xeon E5s. Intel has not confirmed this.) Frankly, with Intel turning in the best fourth quarter and fiscal year in its history, in terms of profits and revenues, as 2011 came to a close, despite a PC slowdown and whatever issues stalled the Xeon E5s, it is hard to argue that Intel made the wrong call.

Chip happens

Intel is just starting to talk to press and analysts under embargo this week about the forthcoming Xeon E5s, and it is no coincidence that it is doing so just ahead of AMD’s Analyst Day. (El Reg is reporting this to you from coach on a Delta flight to Portland, Oregon, ahead of a briefing by Intel from its Beaverton chip and server development labs.)

As El Reg exclusively disclosed last May, the plan with the Xeon E5s is to take what would have normally been a chip for general-purpose two-socket workhorses and bifurcate the line into multiple processor and chipset variants to address very precise market segments. This is, of course, what AMD did two years when it created two different two-socket server families: the Opteron 4100s – which could also scale down to single socket machines aimed at small, power-sensitive workloads – and the Opteron 6100s, which could scale up to four processor sockets.

Anything AMD can do, Intel can do. (The market decides if Intel can do it better, or at least well enough to allow IT managers to fall back on the “nobody ever got fired for buying Intel” insurance policy.)

Intel is actually cutting its server market into eight pieces with the Xeon E5 launch. That’s Itanium 9300s and Xeon 7500s and E7s at the high-end (and eventually the “Sandy Bridge-EX” E8s). That’s two segments of the market that share chipsets and memory cards, but that have different motherboards and sockets. At least until Intel finally delivers, as it is rumored to be in the works, the long-promised common Xeon-Itanium socket. That could happen with the E8s, but it is far more likely to happen with the “Ivy Bridge-EX” Xeon E9s years hence. At the low-end, there’s the single-socket Xeon E3 and Atom processors, depending on how wimpy or brawny your workload is. That’s four addressable server segments in total.

The Xeon E5s will also span four different server types and will cover the middle and overlap with the high and low ends. The Xeon E5-2600, as the first of the “Romley” server platforms are expected to be called, will use the “EP” variant of the Xeon E5 chip that plugs into the new “Socket R” CPU socket. This socket is not compatible with the current Xeon 5500 and 5600 processors, but has all sorts of goodies, including two QPI links between the processors, support for unregistered, registered, and load-reduced (LR) DDR3 main memory, and integrated PCI-Express 3.0 controllers on the processor. This is the chip that Intel has presumably been shipping under NDA to selected supercomputer and hyperscale data center customers since last fall. This chip is clearly aimed at two-socket Opteron 6200 machines.

For two-socket machines that don’t need all of these capabilities, Intel is expected to roll out its “Sandy Bridge-EN” chips, rumored to be called the Xeon E5-2400s. These chips will plug into the new “Socket B2″ socket and will sport only one QPI link between processors as well as fewer memory channels, fewer DIMMs per core, and fewer PCI-Express 3.0 slots. This chip is fired directly at two-socket Opteron 4200 iron.

If the rumors are right, then Intel will also ship a variant of the Sandy Bridge-EP chip that will be able to span four processor sockets in a single system image. This chip is expected to be called the Xeon E5-4600 and is obviously targeting the four-socket Opteron 6200.

And finally, Intel will field a Xeon E5-1600 chip, aimed at single-socket servers and workstations and based on the Sandy Bridge-EN chip that will zero in on single-socket Opteron 4200 servers and whatever plans AMD has to revive its single-socket server biz with the Opteron 3000 series, which it said it was working on back in November. The first Opteron 3000 chip, code-named “Zurich” and presumably to be named the Opteron 3200 to be consistent with the 2012 series of Opteron processors, is basically a cut-down Opteron 4200 with six or eight cores that will plug into an AM3+ socket instead of a C32 socket.

In any event, Intel appears to be looking to chase the microserver segment with the Xeon E5-1600 as AMD is looking to pursue with the Opteron 4200 and 3200 chips. The word on the street is that the Xeon E5-1600 will plug into the Socket R socket, but it would make more sense for it to use the lower-cost Socket B2 socket.

Should all of this come to pass in 2012, it is safe to say that Intel has a weapon to match everything that AMD can throw at it – and then some. AMD only has one flavor of four socket machine, and Intel has three if you count Itanium. AMD has only two kinds of single-socket boxes it can bring into the field, Intel has three if you count Atom. AMD has two two-socket boxes, but Intel has four if you count Itanium.

It’s Hammer time, again

It must have been such fun to run AMD when Intel’s server and PC chips were misaligned with the market needs. It must be daunting to come into work every day at AMD and see the lead in process technology, cash, clout, and chip and market coverage that Intel currently has not just over AMD, but over anyone who is making processors for anything larger than a smartphone or tablet.

AMD has been clever in a lot of ways to survive the Intel onslaught despite being behind in process technology. With the Opteron 4100s and 6100s, the company had to do its own full platforms – chipsets and processors – for the first time, which is a lot of change to manage all at once. Moreover, with the Opteron 6200s, AMD took its eight-way server architecture, beefed it up with more and faster HyperTransport links across the CPU sockets, and then double-stuffed six-core processors into a single socket and convinced the software vendors of the world that this was indeed a four-socket, rather than an eight-socket, machine. For systems and application software that is socket-based, this little maneuver cuts software feeds in half.

AMD has also been winning the core count skirmish against Intel and positioning its two-core “Bulldozer” module used in the Opteron 4200s and 6200s as two strong physical threads against Intel’s weaker HyperThreaded cores. However, with a shared scheduler, on workloads that make heavy use of 256-bit floating point instructions, half of the 16 cores in an Opteron 6200 will often sit idle and the net effect is that the performance should be about the same as the forthcoming Xeon E5 with eight cores running 256-bit floating point. AMD has two stronger cores, but only if you want to do 128-bit math or integer work.

So what is AMD to do?

Go back to the drawing board and exploit whatever weaknesses it can find in Intel’s armor, just as always. Or, start a fight on a new battlefield where Intel is not going to be so strong.

Back in November 2010, two months before the management shakeup at AMD, the company said that its plan for this year was to bring out replacements for the C32 socket used for Opteron 4100 and 4200 processors and the G34 socket used with Opteron 6100 and 6200 processors.

The plan calls for the high-end Opterons, code-named “Terramar” and presumably called the Opteron 6300, to have 20 Bulldozer cores based on a next-generation core, code-named “Piledriver”. The low-end will get the “Sepang” Opteron 4300, a ten-core chip that is essentially what gets double-stuffed into a socket to make the Terramar chip package. Rumor has it that AMD will boost memory capacity with these forthcoming Opterons as well as support PCI-Express 3.0 peripherals. The Terrarmar and Sepang chips will be etched in the 32 nanometer processes used by GlobalFoundries, AMD’s spun out former chip manufacturing operations.

Presumably there is a process shrink to 28 nanometers to boost clock speed and therefore single-threaded application performance of these Opteron 4300 and 6300 chips in the works, but AMD has not said yet and will no doubt lay out its plans at Analyst Day this week.

As was the case during the Great Recession, now would be a particularly bad time for AMD to force a socket transition onto its smaller band of server customers, and the new management at AMD must be looking pretty hard at that roadmap, wondering if they can change as little as possible now to buy time to do a lot more radical engineering for the future.

If I were running AMD, I would be looking very hard at that “Bobcat” core that is the alternative to Intel’s Atom and start thinking about servers, and also go back and look at the“Trinity” low-power Fusion chip, which is based on the Bulldozer cores.

When AMD was kicking Intel in the chips in the mid-2000s, Chipzilla relatively quickly (okay, it took years) shifted over to the Core laptop chip architecture for its PCs and servers and not only saved its chip business, but blunted the AMD attack. Intel has copied most of the ideas that made the Opteron better or different and is now using its wafer-baking process technology and its ability to set market prices to force AMD to compete mostly on lower price for roughly equivalent performance and features.

This is not an enviable position to be in for AMD, obviously. But there’s always the ARM option, and AMD could do something radical like buy Applied Micro or Calxeda and turn the x86 chip war into a two-front war for Intel to have to fight. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

Also posted in HPC, HPC Hardware | Leave a comment

New Whitepaper: Boost RAM Bandwidth by 20% with a Single Command

Colfax International has published a new whitepaper by Stanford’s Andrey Vladimirov entitled: Terabyte RAM Servers: Memory Bandwidth Benchmark and How to Boost RAM Bandwidth by 20% with a Single Command.

Colfax International produces servers capable of supporting up to 1 TB of RAM and up to 4 Intel Xeon CPUs. This paper reports the memory bandwidth benchmark of these servers obtained using the STREAM code. Our benchmark includes comprehensive statistical data: the mean, standard deviation, extrema and the distribution of bandwidth measurements. The distribution of measurements reveals several modes of RAM performance, including an above-average bandwidth mode. By default, the mode realized by any given benchmark depends on an unpredictable runtime pattern of thread and memory binding to the physical cores. The paper shows how to optimize memory traffic for bandwidth and consistently achieve the fastest mode. This is done by controlling the code’s thread affinity, and results in a bandwidth increase around 20% over the average unoptimized performance.

Download the whitepaper (PDF).

Also posted in Computing Research, HPC, HPC Hardware | Leave a comment

Video: Gordon Supercomputer Wows TV Audience

In this video from Fox News, researchers describe the power and capabilities of Gordon, the flash-based supercomputer at the San Diego Supercomputer Center.

Also posted in HPC Hardware, New Installations, Storage, Video | Leave a comment

Podcast: Eurotech Leverages Mfg Excellence for HPC Market

In this this podcast Giovanbattista (Giovanni) Mattiussi from Eurotech discusses the company’s push into the HPC market and their growing presence at the annual SC and ISC conferences. Widely known for their embedded manufacturing capabilities, Eurotech is receiving accolades for their high-density Aurora Intel-based clusters.

Download the MP3 * Subscribe on iTunes * Subscribe on other podcast players. If your IT Crowd blocks Dropbox, you can download the audio from this Google page.

insideHPC: Eurotech does not seem to be widely known in the U.S. supercomputing market. When did the company start doing HPC?

Giovanni: I would start saying the Eurotech is a publicly listed global company who does not only HPC. I think this is important to point out for financial and competence reasons: Eurotech relies on a wide beyond HPC offering that guarantees cash flows and technical synergies/exchanges between divisions.

Eurotech started doing HPC in 1998. For 10 years, between 1998 and 2008, the company took part to large (10M€+) HPC projects as engineering and design partner, producing special and general purpose supercomputers and collaborating with some of the most prestigious European research centres. The competencies inherited from the core embedded electronic business allowed Eurotech to include innovative design solutions in its HPC systems. Supercomputer were evolving in HPC clusters, using commercial components, absorbing increasingly more power, generating increasingly more heat and using an increasingly smaller space: all of these aspects are areas where an embedded electronic company like Eurotech thrived.

These 10 years saw the birth of supercomputers like the APE series, a family of systems that almost set a paradigm in the history of the 3D Torus architecture for LQCD. Also worth to be mentioned, Janus was one of the first FPGA based supercomputers and Avogadro, the first Eurotech top500 entry. A common characteristic of these systems is that they never became commercial products, leaving Eurotech to play in the field of custom supercomputers. Things changed in 2008 with Aurora, which Eurotech designed in collaboration with the research consortium Aurora Science, backed by the prestigious INFN, the national institute of nuclear physics, where scientists like Fermi worked. Aurora was designed to be highly “scientific” and special in its design, but also suitable to be marketed because it relied on main stream components. Aurora soon became a product line, which leveraged more than 10 years of research. Around the Aurora product family, Eurotech built its HPC division shaping it as independent business unit. Nowadays, whilst maintaining its hardware manufacturer DNA, Eurotech can offer HPC solutions, integrating its own and 3rd party hardware and software plus services that cover design, installation and support.

insideHPC: What prompted Eurotech to develop the Aurora series of supercomputer clusters based on commodity components?

Giovanni: The idea that the company had built through research projects enough technical competencies to design, build and launch its own product line. Also, the willingness to follow the HPC industry trends which were pushing toward standardization if not commoditization. Eurotech keeps doing research projects (DEEP, Dynamic Exascale Entry Platform, and others I can’t mention yet are some examples), but strategically aims to become a player in the HPC market like the Crays or SGIs of this world.

insideHPC: Eurotech has a core competency in hardware manufacturing. What additional strengths does the company bring to the table in the HPC marketplace?

Giovanni: You said it right. Eurotech comes from hardware design and manufacturing. However, few years ago, the company started its journey as software developer: the “device cloud” offering Eurotech proposes is an example. At the group level, we have been building software competencies to match the HW ones. In the HPC division, at the moment, we retain a 50% split between HW and SW engineers. So, despite not producing HPC software, I believe Eurotech has the competencies to integrate and maintain HPC software packages. The other aspect are good services that, despite not being sold standalone, are built on a a real intimacy between us and our customers and follow the customers throughout each HPC project. So, I would say, technical leadership, solution design and customer intimacy are the strong points in our HPC proposition.

insideHPC: In terms of your HPC products, would you say that you are mostly a player at the very high end, or do you also have departmental offerings?

Giovanni: With the introduction of Aurora, we have been in the condition to sell at the same price level of companies like Cray and IBM for instance. Also, we normally configure small clusters and mid end systems that, despite maintaining a high engineering content, are stripped down of “fancy” features to become more standardized. Due the size of our company we prioritize producing rock solid, high quality, energy efficient HPC systems, rather than selling on volume. We totally focus on customers, trying to design the best solution for them. This is the reason why we think that more than competing with many large HPC hardware vendors, we complement their offering.
Also, note that we are the only HPC player offering a rugged high performance computer, able to withstand vibrations, heat, cold, rain etc. a product that oil&gas, security and meteorological sectors are seeing with an increased interest.

insideHPC: Is the ISC conference an important part of your HPC marketing strategy?

Giovanni: Yes, it is. To be honest, if I had to do a crude analysis of the cost per lead, I would need to disqualify both ISC and SC! However, there has been no other marketing activity that has given me the same quality in the leads so far. Also, both ISC and SC are unique opportunities to showcase Eurotech technologies in front of all industry reunited. This is not trivial for a company like Eurotech whose marketing reach is limited by budget. ISC is particularly relevant because it is an European show and Europe is at the moment our main field of play.

 

insideHPC: On the road to Exascale computing, It seems like Europe has chosen to focus on developing software. Does Eurotech participate in these planning discussions?
Eurotech is involved in PRACE and other European initiatives. Recently, we announced our participation to the European Technology Platform for HPC. This collaboration happens at a very wide European level and wants to tackle exascale challenges from the software and hardware points of view. Eurotech already participates to large EU funded research projects like DEEP, whose focus is equally hardware and software. All in all, what I can see is that, eventually, Europe realized that only communitarian European wide initiatives will bring enough weight to play at the same level than US, Japan and now China in the supercomputing arena. Maybe, this will serve an example to inspire that European political unity, whose absence is now the cause of the sever economic a crisis. At the moment, I would be happy with an HPC united Europe!

insideHPC: Besides Europe, you have subsidiaries in Japan, the U.K., and the U.S.. Do you believe Eurotech will be a worldwide force in HPC in the long term?

Giovanni: Yes we do, but it will take some time. We believe Europe is where at the moment we stand most chances to increase our installed based. At the same time, we are equipping our worldwide sales force to be able to sell HPC and discussing business in Japan and the middle East. While in Japan we can sell through the locally recognized Advanet brand, markets like the U.S. one will require Eurotech to collaborate with a U.S. system integrator or a larger US vendor.

Also posted in Business of HPC, HPC, HPC Hardware | 1 Comment

4 Day CUDA Course in Seattle, Jan 24-27

Acceleware, partnering with NVIDIA and Microsoft, are offering a four-day course designed for programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.

Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.

Register before January 13 and receive $250 off your course fee! Enter promotional code: AXTEB2012

Also posted in HPC, HPC Education and Training, HPC Hardware, HPC Software | Leave a comment

Hengeveld: 2012 – Application and Gnomes

My wife Jennifer is a late riser.  She goes to bed late after whatever fun or work she had the night before. She snoozes the morning away, and awakes noon-ish to me either making her breakfast (on the weekends) or calling her to wake her up (the rest of the time). She assumes that gnomes of morning have made ready many good things while she was in dreamland.  She wakes ready to take advantage of that bequest in her new day. There is an analogy in there… someplace.

Happy New Year! For all of its hits and misses 2011 was an amazing year for the HPC industry, in my last post on SC11 and disruptive innovation I covered the highlights of the last big event of 2011. Looking at what’s ahead, I am expecting 2012 to be the year of Application and Gnomes.

Roadrunner, the first and only IBM system to reach petascale on the top500 list, was hard to use and hard to program.  That’s fine for a one-of-a-kind box.  But, I expect by the end of 2012 there will be 20+ petascale systems and they will be doing real work, real science.

IBM Roadrunner – Wikimedia Commons

The “Practical Petascale” era dawned at SC11 and 2012 will see a great proliferation of petaflop machines. Two years ago, a petaflop machine was over 10,000 nodes and was an expensive beast. Now, an Intel Xeon E5 based cluster will achieve a petaflop with roughly 3,000 2 socket nodes. These systems are programmable with standard tools and techniques and can be rapidly applied to a broader range of applications.

Everybody will want one.  Who knows, soon it will be a measure of the Rich and Famous… I could see it now – “… and darling, in this room, we keep the Van Gogh’s, and over there… is our petaflop cluster, its being used to support famine relief and protecting endangered species in New Guinea.”

Many nations and institutions will put together something like that to solve their toughest problems.  The tools are in place to make scaling applications easier.  With this in mind, I am focusing the next few months on understanding practical petascale applications.  What are these new systems doing?  How are they contributing to science?  How are they contributing to national competency?

Over the past 4-5 years a tremendous amount of technology has been developed and put in place to create this era of HPC innovation and application.  Many technologies take 4-6 years to go from the first inklings of technology to its commercial deployment.  If 2018-2020 is the arising of exascale (Intel has committed to an effort for 2018 for a 20MW / Exaflop – Kirk Skaugen’s ISC11 talk) then 2012 is 5AM for the Exascale Gnomes, its dawn. Time to get to work.

Practical Exascale will need solutions to the canonical “exascale problems” such as “PRESS” – Programmability, Reliability, Efficiency, System Scalability.* Each of those has to have Gnomes at the ready in 2012.

The more I look at the research into exascale applications like CFD, weather modeling, and molecular simulation, the more exascale problems don’t look like bigger versions of their petascale brothers.  Data will be less organized and less monolithic.  The macro and micro level simulation will be modeled and interactions between the two will drive complexity, with millions of threads running, coordinating and communicating with each other. Programming all of these and keeping all of it working across a wide variety of data will be a significant problem. Gnomes will need to continue work on the optimization of the Seven Dwarves as well (ouch, I didn’t foresee that one). Perhaps programming and system scaling people have another year or two to get their acts together, but not more than that.

Reliability Gnomes also have to begin serious work. Historically, power efficiency and reliability have been competing interests.  Its physics as much as anything, smaller swings of signal means less energy to store information. Eventually errors will show up. The bigger the system, the more combinations of errors will affect system performance.   Detection and recovery is expensive from a silicon perspective and weighs against the power budget as well.  Of the exascale issues, this one scares me the most. If the Programming Gnomes have two years to crack their problem, Reliability Gnomes really have about one… their results need to feed into process research and design.

Work on Efficiency is really work on efficient performance. In exascale I don’t think we can discuss one without holding the other relatively constant.  There are lots of wasteful parts of the exascale system: power delivery; cooling; storage; and interconnect. These all have to be power streamlined. Gnomes can already be heard singing a happy work song on all of these. The use of dedicated highly parallel architectures like MIC and radically different interconnect approaches are at least asking the right questions if they aren’t getting answers yet.

So looking forward to 2012, I expect real movement on these four key questions. So when the rest of the world awakens in, oh 2016, or so… they will find their metaphorical Exascale Breakfast plates full of what I made Jen last weekend.

*- I love memory aids and acronyms… witness “Intel® Many Integrated Core (MIC) architecture.”

 

Also posted in Exascale, HPC, HPC Hardware | 1 Comment

Slidecast: Clustered Systems Introduces Remarkable 100KW Compute Rack

In this slidecast, Phil Hughes from Clustered Systems presents: The 100KW Rack Introduction. Using a grant from the US Dept. of Energy, the company has developed a highly dense compute rack with a PCI-E interconnect and an extremely efficient warm-water cooling system.

The company estimates that a completely operational Petaflop compute facility using the new technology could be built for approximately $20 Million USD. To be available through OEM partners, first shipments will likely start in 2Q2012, and SLAC will deploy the first system.

Download the Slides (PDF)Download the MP3 * Subscribe on iTunes * Subscribe on other podcast players

Also posted in HPC, HPC Hardware, Podcast, Video | 1 Comment

New Chippery on Parade at ISSCC – CPU and memory makers strut their stuff

By Timothy Prickett Morgan • Get more from this author

The new year in IT always begins around now, when the IEEE puts out the advance program for the International Solid State Circuits Conference, which takes place in San Francisco in February. This time around, it runs from February 19 through 23, and while there are not a large number of server-class processors coming out, there are some very interesting system-on-chip and memory technologies that chip makers will be showing off at the upcoming 2012 event.

First out of the gate will be Intel with a preview of the “Ivy Bridge” processors for PCs, which are made in its new 22 nanometer Tri-Gate process and which will cram a multicore CPU and a GPU onto the same sliver of silicon. Intel will also be showing off a new dual-core Atom processor implemented in its current and well-established 32 nanometer processes sporting on-chip Wi-Fi networking. This is presumably the “Cedar Trail” family of Atom processors that were originally expected around September, then November, and now sometime next year, according to the rumor mill. Intel will also be showing off a 32-bit x86 chip that has an operating range of between 280 millivolts to 1.2 volts that is implemented in its 32 nanometer processes.

Oracle will be on hand to talk about the eight-core Sparc T4 processor that was announced back at the end of September and that just started shipping in systems back in November. Oracle might slip a bit and talk about the future Sparc T5 processor, which will be socket-compatible with the Sparc T4 processor and which will ship by late 2012. Then again, Oracle doesn’t want to screw up Sparc T4 system sales, so maybe it won’t say anything. Especially considering that the Sparc T5 will have 16 cores running at around 3GHz or so and scale up to eight sockets in a single system – yielding about 2.5 times the aggregate oomph on thread-happy workloads like databases and middleware.

IBM is not saying anything about its future Power7+ or Power8 processors for its Unix and proprietary systems. But Big Blue will be showing off a prototype 3D system-on-chip design that will use through silicon via (TSV) technology that it perfected with Micron Technology for hybrid cube memory. IBM will be demonstrating that the techniques that can be used to stack up DRAM chips and lash them together into a parallel memory cluster (well, that is what HMC memory is, more or less) can be used to link embedded DRAM to processor cores. Such technology will be needed to make more powerful and energy-efficient parallel systems.

Researchers at the Georgia Institute of Technology, Korea Advanced Institute of Science and Technology, and Amkor Technology will be showing off a similar stacked chip called3D-MAPS, which is a massively parallel processor with stacked memory. In this case, the chip in question has 64 cores running at mere 277MHz and 256KB of SRAM memory mated to it. This is a tiny chip in terms of raw chip performance, but it delivers 64GB/sec of memory bandwidth and only consumes 5 watts of juice, and on memory-intensive workloads with a certain degree of parallelism, 3D-MAPS could scream. The next generation 3D-MAPS chip will have two logic tiers with a total of 128 cores and three DRAM tiers instead of one SRAM tier for memory.

The University of Michigan will be stacking up chips, too, with its Centip3De project, which will put 64 ARM Cortex-M3 embedded processors into a cube. The Wolverines have been talking about (PDF) a seven-layer 3D chip that has 128 Cortex-M3 cores and 256MB of stacked DRAM all glued together, so this appears to be a chip off the old block.

Advanced Micro Devices will be showing off a “resonant clock design” for a 64-bit x86-processor towards the end of the day, and clearly there will be a need for some coffee during that one. Fudan University of China will be showing off a 16-core, 320 milliwatt, 800MHz processor it has cooked up with message passing and shared-memory inter-core communications – all cooked up in an ancient and cheap 65 nanometer process. Cavium will be showing off its latest 32-core MIPS-based processors, which sport network accelerators and which are sold under the Octeon II brand. Fujitsu will be there to show off its current K massively parallel supercomputer, powered by the eight-core Sparc64-VIIfx processor and currently the most powerful super in the world.

Hynix Semiconductor and Samsung Electronics will be showing off their respective 2Gbit and 4Gbit DDR4 SDRAM memory chips, which will eventually make their way into PCs and servers. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.

Also posted in HPC, HPC Hardware | Leave a comment

Advertisement

National HPCC Conference Advertisement

View All Videos

insideHPC.com is a production of insideHPC, LLC. © 2006-2011 Sitemap