During SC09 week I grabbed some time at the end of a day to sit down with Bryan Sparks and Jim Ryan in their capacities as members of the InfiniBand Trade Association (they both have day jobs; Brian is with Mellanox and Jim works for Intel). During our conversation we touched on the IBTA’s ten-year anniversary and the evolution of InfiniBand from fringe technology to stalwart of the Top500 list. Along the way we touch on some of the drivers of that adoption, the close connections of IB to PCI-e, the relationship between the The OpenFabrics Alliance and the IBTA, and how a book may hold the secret to breaking IB out of its HPC stronghold and into broader IT adoption.
Live from the show floor with the InfiniBand Trade Association
Live from the show floor with Dell: a discussion about Cray, competition, and partnerships in HPC
During SC week I sat down with Donnie Bell, senior manager in the enterprise marketing group, to talk about the announcement that Dell would be marketing a version of Cray’s CX1 to low-end HPC users. This audio is much more two way than some of the other conversations I’ve posted, because I was curious about some of the details that went into this deal. After giving me an overview of the system, our conversation was pretty wide-ranging over the business aspects of this deal, touching on everything from why Dell (which builds hardware) is partnering with Cray to how you get a product like this to customers who don’t identify with HPC and how the brand exposure doesn’t hurt (rather than help) Dell’s future HPC business. If you are interested in how partnerships like this happen in HPC, this is the conversation for you.
Listen to the interview [audio:http://insidehpc.com/media/SC09/dell.mp3]
Download the audio file.
Live from the show floor with Adaptive Computing
As part of my audio catch-up from SC09 here is my conversation with Michael Jackson, the Co-founder, President, and COO of Adaptive Computing (formerly known as Cluster Resources). Michael is talking about Adaptive’s news during the week of the show: Moab’s connection with Voltaire’s Unified Fabric Manager and HP’s plans to resell Moab Adaptive Computing Suite, and Moab in the University of Toronto’s SciNet Consortium (interestingly, they paid for Moab in the first month based on energy savings alone).
This one has some blackberry noise for a few seconds about two-thirds of the way through — sorry. Didn’t hear it at the time.
Listen to the interview [audio:http://insidehpc.com/media/SC09/adaptive.mp3]
Download the audio file.
Live from the show floor with Avere Systems
This year, as with last year, I recorded a bunch of audio during my meetings at SC. Unlike last year, however, I didn’t get hardly anything up during the show. So over the next several days I’ll be mending that sin as I work through my audio backlog.
In this segment Ron Bianchini, the President and CEO of Avere, starts off by introducing us to his well-seasoned team, and then he walks me through the story of his company and where his product is positioned in the storage acceleration market space. Avere’s appliance sits in between your storage and your server and, the company hopes, enables you to separate decisions about performance from decisions about capacity. In terms of results, here’s one: Ron walks us through Avere’s NAS storage appliance and shows results at 130,000 IOPS on the SPECsfs2008 NFS benchmark, with one quarter of the disks needed by competitors. Cool stuff.
In this audio Ron is walking me through a presentation, but the conversation is very followable without it. We recorded this in the conference registration area early Wednesday morning before the conference opened, but you’ll still hear the noise of early conference goers in the background. Hey — it’s like being there without paying extra to check your bags.
Listen to the interview [audio:http://insidehpc.com/media/SC09/avere.mp3]
Download the audio file.
StarGate Demo at SC09 Shows How to Keep Astrophysics Data Out of Archival “Black Holes” [UPDATED with pics]
Reserved Bandwidth on ESnet Makes Possible Mulit-Gigabit Streaming Between Argonne, SC Conference in Portland
[UPDATE: I published this on the plane, and didn't have enough bandwidth to add the pics. They're inline now.]
As both an astrophysicist and director of the San Diego Supercomputer Center (SDSC), Mike Norman understands two common perspectives on archiving massive scientific datasets. During a live demonstration at the SC09 conference of streaming data simulating cosmic structures of the early universe, Norman said that some center directors view their data archives as “black holes,” where a wealth of data accumulates and needs to be protected.
But as a leading expert in the field of astrophysics, he sees data as intellectual property that belongs to the researcher and his or her home institution — not the center where the data was computed. Some people, Norman says, claim that it’s impossible to move those terabytes of data between computing centers and where the researcher sits. But in a live demo in which data was streamed over a reserved 10-gigabits-per-second provided by the Department of Energy’s ESnet (Energy Sciences Network), Norman and his graduate assistant Rick Wagner showed it can be done.
While the scientific results of the project are important, the success in building reliable high-bandwidth connections linking key research facilities and institutions addresses a problem facing many science communities.
“A lot of researchers stand to benefit from this successful demonstration,” said Eli Dart, an ESnet engineer who helped the team achieve the necessary network performance. “While the science itself is very important in its own right, the ability to link multiple institutions in this way really paves the way for other scientists to use these tools more easily in the future.”
“This couldn’t have been done without ESnet,” Wagner said. Two aspects of the network came into play. First, ESnet operates the circuit-oriented Science Data Network, which provides dedicated bandwidth for moving large datasets. However, with numerous projects filling the network much of the time for other demos and competitions at SC09, Norman and Wagner took advantage of OSCARS, ESnet’s On-Demand Secure Circuit and Advance Reservation System.
“We gave them the bandwidth they needed, when they needed it,” said ESnet engineer Evangelos Chaniotakis. The San Diego team was given two two-hour bandwidth reservations on both Tuesday, Nov. 17, and Thursday, Nov. 19. Chaniotakis set up the reservations, then the network automatically reconfigured itself once the window closed.
At the SDSC booth, the live streaming of the data drew a standing-room-only crowd as the data was first shown as a 4,0963 cube containing 64 billion particles and cells. But Norman pointed out that the milky white cube was far too complex to absorb, then added that it was only one of numerous time-steps. In all, the data required for rendering came to about 150 terabytes of data.
In real time, the data was rendered on the Eureka Linux cluster at the Argonne Leadership Computing Facility and reduced to one-sixty-fourth of the original size for a 1,0243 representation, making it more manageable and able to be explored interactively. The milky mesh was shown to contain galaxies and clusters linked by sheets and filaments of cosmic gases. Its all clearer in the movie, which you can see here (other resolutions at the bottom of this page).
The project, Norman explained, is aimed at determining whether the signal of faint ripples in the universe known as baryon acoustic oscillations, or BAO, can actually be observed in the absorption of light by the intergalactic gas. It can, according to research led by Norman, who said they were the first to determine this. Such a finding is critical to the success of a dark energy survey known as BOSS, the Baryon Oscillation Spectroscopic Survey. The results of his proof-of-concept project, Norman said, “ensure that BOSS is not a waste of time.”
Creating a simulation of this size, even using the petaflops Cray XT5 Kraken system at the University of Tennessee can take three months to complete as it is run in batches as time is allocated, Norman said. The data could then be moved in three nights to Argonne for rendering. The images were then streamed to the SDSC OptiPortal for display.. Norman said the next step is to close the loop between the client side and the server side to allow interactive use. But the hard work — connecting the resources with adequate bandwidth — has been done, as evidenced by the demo, he noted.
But it wasn’t just an issue of bandwidth, according to ESnet’s Dart. “We did a lot of testing and tuning,” said Dart. ESnet is managed by Lawrence Berkeley National Laboratory (LBNL).
Other contributors to the demo were Joe Insley of Argonne National Laboratory (ANL), who generated the images from the data, and Eric Olson, also of Argonne, who was responsible for the composition and imaging software. Network engineers Linda Winkler and Loren Wilson of ANL and Thomas Hutton of SDSC worked to set up and tune the network and servers before moving the demonstration to SC09. The project was a collaboration between ANL, CalIT2, ESnet/LBNL, the National Institute for Computational Science, Oak Ridge National Laboratory and SDSC.
Mellanox intros 120 Gbps switch, application offloading of MPI into the adapter
Today Mellanox Technologies announced two new additions to their InfiniBand technology offering from the show floor at SC09.
120 Gbps InfiniBand switch
First up is a 120 Gbps InfiniBand switch. From the release
Based on InfiniScale IV, Mellanox’s 4th generation of InfiniBand switch silicon, the IS5000 switch system family delivers the highest networking bandwidth per port to enable the next generation of high-performance computing, cloud infrastructures and enterprise data centers. The new switch solutions reduces network congestion and the number of network cables by a factor of three, providing customers with the optimal combination of cost-effective, proven performance and efficiency enhancements to address next-generation, Petascale computing demands.
This switch is actually getting some air time on the show floor this year as the hardware enabling the 120 Gbps IB network that exhibitors can connect to as part of SCinet. Mellanox’s John Monson told me in a conversation ahead of the announcement that the switch itself is ready, but won’t be out in general availability until Q1 of next year to give them time to develop the ecosystem of products that go with it.
As I was talking to Monson, we got sidetracked into a discussion of where Mellanox’s business is, and I was fascinated to learn that China alone was 40% of Mellanox’s revenue last quarter (the recently announced Tianhe, for example, is Mellanox end-to-end according to Monson). Russia is also significant in terms of revenue, a fact that provides an external indicator of the degree to which Russia’s stated interest in HPC is turning into action. Spotting a pattern, I asked about the other half of the BRICs — India and Brazil. India is “on the radar,” according to Monson, but Brazil is still developing.
MPI communication offloads
The other significant technology move the company announced from SC09 this week is application offloading as part of their ConnectX-2 InfiniBand adapters, announced in August of this year.
Of course you are probably familiar with the idea of offloading network protocol overhead onto adapter cards. An example of this is TCP/IP offload engines (TOEs) that move the protocol management — adding headers, forming packets, and so on — away from the processor onto the network card itself, freeing up the processor to do more application work. Mellanox’s IB cards already do this. “Application offload” is the same idea, only now extended to things like collective operations with MPI.
Broadly speaking, applications do two kinds of work: compute and communicate. In MPI applications the communications from process-to-process starts in the CPU, where the MPI messages are packed up and passed down to the NIC, which then uses its protocol (IB in this case) to send it to the receiving process. On the receiving side once the NIC reconstructs the data stream it passes that data back up to the processor, where it is reassembled into MPI messages and handed off to the parallel application. Having the sending and receiving processors so intimately involved in the processing of MPI data for inter-process communication introduces noise and jitter, and reduces the ability to create a fully synchronized system. All of which hurts application scalability in 20-40% range according to the company.
Mellanox was part of a team funded by the Department of Energy to find a solution for this problem, the result is Application offload, which moves the MPI processing part of application communication down to the NIC as well, leaving the CPU to do only the application compute cycles.
From the release
Mellanox ConnectX-2 InfiniBand adapters introduce a new offloading architecture that provides the capability to offload application communications frequently used by scientific simulation for data broadcast, global synchronization and data collection. By offloading these collectives communication, ConnectX-2 adapters help to reduce simulation completion by accelerating the synchronization process and freeing up CPU cycles to work on the simulation, and enable greater scalability by eliminating system jitter and noise — the biggest issues for performance at scale.
The technology was developed in collaboration with Oak Ridge (being recognized this week with the inaugural insideHPC HPC Community Leadership Award), and is a firmware change that is supported only in the ConnectX-2 line of adapters. According to the company, they expect beta users in Q1 of next year.
Live from the show floor with Cycle Computing
On Monday while the exhibit floor was still under construction I stopped by Cycle Computing’s booth to talk with Jason Stowe, the CEO of Cycle Computing. To be honest, I went to the Cycle Computing booth thinking that there wasn’t going to be much there of interest to me. But in one of those great surprises that keeps me coming back to SC each year, I came away thinking there was a lot to the company’s technology. With customers from the very large (Lockheed Martin and Johnson&Johnson, among others) to very small start-ups, Cycle is helping customers take advantage of computers they already have for HPC, as well as facilitating a move to the cloud for HPC use cases.
Listen to the interview [audio:http://insidehpc.com/media/SC09/CycleComputing11162009.mp3]
Download the audio file.
Live from the show floor with Microsoft
On Monday while the exhibit floor was still under construction I stopped by Microsoft’s booth, recorder in hand, to talk with Kyril Faenov, General Manager of Microsoft’s Technical Computing Group. It was an interesting chance to talk not only about the beta release of the latest version of Microsoft HPC Server, but also to get a walk-through of Microsoft’s strategy for HPC — from the desktop to Top10 systems and everything in between — including a petascale GPU system that the company is helping to deploy in the near future.
In the interview Kyril does touch on new software that Microsoft is announcing from the show
Today at Supercomputing 2009, Microsoft Corp. announced the immediate availability of betas for Windows HPC Server 2008 R2 and distributed Microsoft Office Excel 2010 for the cluster. Together with the recently announced Microsoft Visual Studio 2010 Beta, which helps simplify parallel programming, these advances make it possible for more users to access supercomputing power through familiar technologies and tools such as Microsoft Office Excel, Windows Server and Visual Studio.
More on these announcements in the release linked above.
Listen to the interview [audio:http://insidehpc.com/media/SC09/MSoft.mp3]
Download the audio file.
MathWorks expands support for parallelism, announces TeraGrid tie-in
At SC09 this week The MathWorks made a couple of announcements related to MATLAB and Simulink, the company’s flagship computation products. Many of you will be familiar with MATLAB as one of the very popular high-level languages and interactive environment that enables you to perform computationally intensive tasks without needing to manage all the details that lower-level languages like FORTRAN and C require. With 2,000 employees in offices all over the world, The MathWorks reports over 1,000,000 users in more than 175 countries in industries ranging from aerospace and defense to education and electronics.
Over the years the company has expanded its offerings to support serious computation, including built-in support for multicore parallelism and mechanisms to allow for distributed computation via libraries like MPI on a large scale. The announcements this week build on those developments.
Enhancements in the Parallel Computing Toolbox
The MathWorks has announced a new version of the Parallel Computing Toolbox — the collection of routines that allows MATLAB programs to run on distributed environments, ranging from multiprocessor computers to clusters and grids with just a few changes to the serial programs. This new version provides an improved distributed array construct to enable MATLAB users to directly access large datasets distributed over many cores, sockets, or nodes in a large compute array.
When I talked with Silvina Grad-Freilich, manager of parallel computing and application deployment marketing at The MathWorks, about this new release of the PCT, she used an example application from the Max Planck Institute’s cancer research program to illustrate the advantages of the changes in the software. Researchers there are working on discovering new cancer therapies, and are generating high quality 3D images of proteins that require millions of projections. With the new PCT they are seeing improvements of 30x in a pool of 64 MATLAB workers — not exactly ideal scaling, but the point is that the performance improvement is significant enough to dramatically improve their throughput with no time or resources lost to developing specific expertise in parallel programming. Not the right solution for everyone, but for many in MATLAB’s target audience, I bet this is a great fit.
This release also features better parallel performance for algorithms in the Statistics and Communications Toolboxes that rely on the Parallel Computing Toolbox. This adds on to already existing functionality in Bioinformatics, Optimization and Genetic Algorithms Toolboxes. Toolbox don’t need to make any changes to their codes to take advantage of multiple processors. You simply point MATLAB or Simulink at a processor pool (a defined set of resources defined in the application that could include multiple sockets on your machine, machines on your local network, or a remote cluster) that includes multiple processors, and the Toolboxes automatically distribute your computation across the whole set of resources.
MATLAB plus the TeraGrid
Cornell University also announced this week that the Cornell Center for Advanced Computing (CAC), in partnership with Purdue University, has been funded by the NSF to bring MATLAB to the TeraGrid as an experimental computing resource. A statement from Robert Burhman, Cornell University vice provost of research, makes it clear that this is again about expanding the applicability of HPC resources to those without deep skills in this arena: “MATLAB on the TeraGrid will help enable a broader class of researchers who are well-versed in MATLAB to reduce the time to solution in a scalable manner without having to become parallel programming experts.” TeraGrid is following on the heels of the Enabling Grids for E-sciencE (EGEE) team in Europe, where MATLAB has been supported since October of 2008.
The Cornell announcement will make MATLAB available to remote desktop and Science Gateway users, and includes support by industry partners Dell and Microsoft along with the MathWorks. The software will be hosted on a 512-core Dell PowerEdge HPC cluster at the Ithaca, NY campus of Cornell running Windows HPC Server 2008. The two use models initially envisioned are the standard single-user interactive MATLAB use you would have using the program on your own system, and as an engine driving Science Gateways such as nanoHUB.org.
While there is a production aspect to this deployment, the NSF funding should provide some insight that there is also a strong research aspect in beginning to understand the challenges and opportunities in deploying software as a service across a large user base that is not necessarily familiar with HPC technologies.
Inaugural HPC Community Leadership Award winners
In early October insideHPC announced voting for the inaugural HPC Community Leadership awards in two categories: Individual and Organization.
As regular readers know, we don’t just report on HPC — we live and work in this community. And we believe strongly in the power of recognizing the people and organizations that make a difference in HPC. The award recognizes the people and organizations who have persevered through technology, budget or organizational challenges to place innovative HPC solutions in the hands of users in business, engineering, technology, and science.
A select panel of HPC rock stars (from both sides of the Atlantic) recommended an impressive slate of nominees. I was enthusiastic about the idea, but even I was taken aback by the tremendous response we got from our readers. And with nearly 1,000 votes cast, the top two in each category were very close. A testament to the quality of the nominees and the work of the nominating committee.
During the Opening Gala at SC09 Monday night, I presented the inaugural awards HPC Community Leadership to Oak Ridge National Laboratory (ORNL) and the University of Illinois’ Bill Gropp.
Organizational Leadership Award Winner: Oak Ridge
ORNL has been one of the highest profile supercomputer centers of recent years — in political, scientific, technology, and media arenas — and has globally raised the profile and value of high end HPC. ORNL has led the way with services, support and research that help science — not just Top500 ratings.
“On behalf of hundreds of Oak Ridge National Laboratory computer scientists, computational scientists and applied mathematicians, as well as the much larger community of scientists and engineers worldwide who work with us, we are honored by the HPC community’s recognition of our accomplishments,” said Jeff Nichols, ORNL’s associate laboratory director for computing and computational sciences. ORNL houses Jaguar, the first petascale supercomputer dedicated to open science. “Dramatic scientific breakthroughs have already been enabled by the remarkably balanced system that combines unsurpassed speed, memory and I/O bandwidth, and we look forward to continued scientific advancements with our partners as we tackle challenges in energy, climate change, advanced materials, and neutron sciences.”
The award was presented to Jeff Nichols, ORNL’s associate laboratory director for computing and computational sciences, in the Oak Ridge Booth Monday night.
Individual Leadership Award Winner: Bill Gropp
Bill Gropp’s best known legacy to the HPC community is the MPI standard. Many people suggest that node level parallelism will run out of sensible programming paradigms long before inter-node — largely due to MPI scaling well beyond the scale of resources around at the times of its introduction. Gropp can be regularly heard arguing how MPI can evolve to keep our millions of lines of “legacy” applications scaling to systems with millions of cores, and he has made major contributions in hierarchical numerical methods for the numerical solution of partial differential equations. He has also been a familiar presence at some of our community’s most high profile events, and this year he lead the SC09 technical program and was a major contributor to several other conference technical programs.
“Bill’s insights into scientific applications is keen, and his knowledge of scientific computing broad. His contributions to the HPC community as well as to the University of Illinois’ extreme-scale computing efforts are invaluable. The Blue Waters sustained-petascale computing project and Illinois’ Institute for Advanced Computing Applications and Technologies would not be the success that they are without him. The HPC Community Leadership award is a well deserved honor,” said Thom Dunning, who leads Illinois’ National Center for Supercomputing Applications and Institute for Advanced Computing Applications and Technologies.
“Those of us who have worked with Bill for many years have always known that he exemplifies true leadership,” said Michael Heath, interim head of the University of Illinois department of computer science. “Not only has he played a major role in advancing high performance parallel computing, but he has done so with particular emphasis on its role in scientific computing. His ability to work both sides of the equation have enabled him to make vital contributions that solve some of the most pressing issues in science and computing.”
The award was presented to Bill in the NCSA booth Monday night.
A thank you from insideHPC
Congratulations to this year’s winners. As I look forward to next year’s awards, I want to take a moment to thank all of you for the tremendous response to this effort. I believe that recognizing leadership is the single best way to ensure that we attract outstanding new talent to our field, and ensure that the supercomputing of tomorrow is even more innovative and vibrant as it is today.
SGI finally announces make-or-break HPC platform
SGI has been working on its next generation shared memory platform, codenamed Ultraviolet, for a long time. The basic idea was to take the NUMAlink-based shared memory of the Altix 4700 line and do that with x86 chips instead of Itaniums. Among other salutary effects, this would reduce the price premium customers have to be prepared to pay for hardware-supported shared memory.
The platform has been in development since well before the most recent bankruptcy and purchase, and the company hasn’t ever publicly wavered on its commitment: in one of CEO Mark Barrenechea’s first interviews following the acquisition, he restated his commitment to the project
“If you don’t believe in UV, you would not have brought the two companies together,” says Barrenechea. “We are fully committed to UV, and it is paramount to our future.”
Today the company announced the realization of that determination: the Altix UV line of x86-based shared memory supercomputers, with first orders shipping to customers who have already signed up in Q2 of 2010.
QLogic gets into servers from IBM, HP, Dell, and SGI
Today QLogic announced that it has inked new or expanded distribution deals for its InfiniBand products with Dell, IBM, HP, and SGI. QLogic is an interesting company in the IB space as they control the entire experience — from silicon (Voltaire buys silicon from rival Mellanox) up through switches, adapters, architecture, and application integration. Their switches are also big: at 648 ports, the QLogic 12000 Series Director 40 Gbps switch packs double the ports of the Mellanox top end gear (Voltaire offers a 648-port variant of their 4700 that uses double density boards).
In my conversation with them, QLogic showed me a slide (image at right, click for larger view) comparing Mellanox’s ConnectX QDR adapter performance (which notably features offload support) with their own TrueScale QDR adapter. The slide shows the performance advantage growing from 5 through 22% at the number of cores in the benchmarked application grows from 64 to 256. By way of explanation, QLogic’s Jesse Parker (GM and VP Network Storage Group) said that offload is helpful at lower core counts, but as core counts increase QLogic’s approach of relying on the cores to do the processing that Mellanox offloads to the cards means that processing resources scale as network demand grows. I suspect that this is greatly influenced by the architecture of the machine, how many cores per socket, sockets per node, cards per node, and so on. But it was an interesting stat nonetheless (the machine benchmarked used 2.93 GHz Xeon 5570s).
IBM already had a deal for switches with the company, but this agreement expands that to include QLogics QDR adapters. From the release
Expanding on its successful OEM agreement with IBM for quad data rate (QDR) InfiniBand switches, QLogic today announced IBM has integrated QLogic’s performance-leading 7300 Series 40Gb/sec QDR InfiniBand host channel adapters into its IBM System x servers for high performance computing (HPC) applications. IBM is the first tier-one OEM to integrate QLogic QDR adapters and will offer them as part of IBM’s latest IBM System Cluster 1350.
The deal with HP is all new, and HP will be offering QLogic’s full line of QDR IB products with HP ProLiant and BladeSystem c-Class servers as part of the HP Unified Cluster Portfolio. From the release:
The HP Unified Cluster Portfolio makes standard-based clusters easy to configure and manage with breakthrough technologies such as QLogic QDR InfiniBand. The QLogic QDR host channel adapters (HCAs), directors and edge switches coupled with HP’s expertise, help organizations achieve successful results with HPC applications such as weather modeling, high energy physics, reservoir simulation, scalable visualization, computer aided engineering impact analysis and computational chemistry.
Dell has grown its offering to include both the QDR switches and adapters (it previously sold QLogic’s DDR switch), and SGI will be using QLogic QDR IB switches and directors with its CloudRack products.
An interesting stat on the company? According to Parker, QLogic has been profitable for 57 straight quarters, and has $340M in cash on hand.
NVIDIA announces Fermi products for 2010 ship
This morning NVIDIA followed up on the recent announcement of its next-generation GPU product, codenamed Fermi, with news of the first set of specific products and shipping time frames.
The pro line of Fermi products will come to market as the “20 series” which you may recall add a lot of new features aimed specifically at HPC, including ECC memory, L1 and L2 caches, much faster double precision support, up to 1 TB of memory, concurrent kernel execution, and fast context switching.
The new Fermi “Personal Supercomputer” products are the C2050 and C2070, complementing the older C1060 card (for reference: 933 GFLOPS single precision, 78 GFLOPS double, 4 GB memory). The C2050 is rated at 630 GFLOPS double precision, has 3 GB of ECC memory, and will retail for $2,499 when it ships in Q2 of 2010. The bigger C2070 ups that to 6 GB of ECC memory and will retail for $3,999 when it ships in Q3.
Tesla is also plugging Fermi into the datacenter line, complementing the S1070 (for reference: 4.14 TFLOPS single precision, 345 GFLOPS double, 4 GB memory/GPU). The S2050 is rated between 2.1-2.5 TFLOPS double precision, has 3 GB of ECC memory/GPU, and will retail for $12,995 when it ships in Q2 of 2010. Big brother S2070 again ups the memory to 6 GB of ECC memory/GPU and will retail for $18,995 when it ships in Q3.
According to Andy Keane, general manager of Tesla business at NVIDIA, the new Fermis aren’t going to replace the previous generation products in either the personal or datacenter lines to accommodate customers who need time to revalidate their applications on the new platform. Keane also told us that these are general availability dates — early availability and customer trials are expected before the end of the year.
First-time exhibitor Tycrid aims GPU technology at bioinformatics market
insideHPC sat down with Chris Heier, president of Tycrid Platform Technologies, a first-time SC09 exhibitor based in Canada, to learn more about their purpose built GPU-based solutions and their focus on the Bioinformatics space.
insideHPC: First of all Chris, welcome to SC09. It’s great to see so many first-time exhibitors — including Tycrid of course. Why don’t we start off with some background for our readers. When was Tycrid founded and why? What opportunity did the founders see that brought you to this particular solution?
Chris Heier: Tim Davies, our co-founder, and myself founded Tycrid in September of 2007. Our backgrounds over the past seven years of working together have been in synthetic aperture imaging and real-time seismic image processing. Working in technical disciplines like these, you find really quickly that there are major limitations in normal computing architectures. Not enough processing power, bandwidth, etc. We had developed some pretty innovative solutions around FPGAs, but these came with the issues of time and expertise required to utilize them.
We decided to do something that we thought would be really cool — attempt to build the most powerful workstation in the world. For years prior to incorporating our business, we had looked deeply into GPU computing, working with companies like PeakStream (now owned by Google), as well as Rapidmind to push into multi-GPU computing. It was difficult at the time, and utilizing multiple GPUs seemed to be very difficult from an end-user perspective. Fast forward to CUDA when the GeForce 8 series rolled out, and suddenly multi-GPU started to look very feasible and seamless to end users.
We had built a workstation using 6 GPUs, originally GeForce, but eventually moved to Tesla. It was tough at first as the BIOS we were working with would fail to boot with more than 4 GPUs, but time and effort prevailed. When we got it working, we had benchmarked with VMD, and had a 58x speedup over what would have been considered a top of the line workstation at the time.
I guess as a summary, we started the company with the desire to bring technology to market that could have a significant impact on scientific discovery. With myself liking fast hardware, and Tim being involved with some of the most computationally intensive sciences, we saw this as a great opportunity to not just supply researchers, but to collaborate with them for the advancement of science. We saw the GPU as a technology that could make this a reality — in an acceptable timeframe.
insideHPC: So is that still your direction, or how has that vision changed over the past year?
Heier: There has definitely been some change in how we intend to move forward. Narrowing our focus has really been the big thing. On one hand, you have a great technology that can be applied to so many things, and on the other, a team that has many great ideas as to how to use it.
Bioinformatics is an area where we feel this technology can really have a positive impact. It is a research area that I believe has true potential in making a big difference in the world. Genomics in particular is where I really see some fantastic new science coming into play. GPU based computing platforms will have a big impact in shaping the future of genomics.
Moving forward, our vision is to build the right team, and develop the right purpose built appliances to establish Tycrid as the leading custom solutions provider in this domain.
insideHPC: So, here in the final months of 2009, this industry seems to have GPU fever. I have to ask you this one: Is Tycrid just one of many new companies trying to find a niche for GPU-based computing?
Heier: No. While we are one of the few companies that decided to focus solely on GPU computing, it is still simply selling commodity hardware. We have the skills to put together some very innovative solutions, but when more well-established companies are getting heavily involved in the space, it doesn’t make sense for us to be just another GPU company.
Our focus is what sets us apart. I love hardware. Always will. But there is really a bigger problem. Anyone can make and sell commodity hardware. Few companies really make it easy for the potential end customer, and even fewer wish to take the initiative to advance science in a very specific direction. We’re not simply talking about hardware anymore, but a complete philosophy that drives everything we do at Tycrid. By developing a strong community with a singular goal, I feel that we can begin to intimately understand the needs of the genomics community, and really create something truly unique that solves many of the domain challenges in the upcoming future.
We see GPUs as being the next evolution in computing technology, a disruptive force, that will allow for the enablement of upcoming science that needs to happen. In genomics, sequencers are going to be coming online that process the genome at unprecedented speeds. The GPU has provided a great opportunity to begin to meet these future demands.
insideHPC: What is Tycrid doing that other companies are not doing with the Tesla GPU? And will your strategy keep you tightly aligned with Nvidia?
Heier: What we are choosing not to do is to take the easy way out. Selling white boxes and trying to be everything to everyone. That is not our game. It is also an approach that I see as counterintuitive to what actually needs to be happening. NVIDIA is doing an excellent job in really pushing GPUs as a computational engine, and that is something that we have been on-board with before CUDA. Without them, I don’t think the landscape on accelerator technologies would be as intriguing as it is today. It is truly a disruptive technology.
Our strategy moving forward is 80% collaboration and 20% integration. By collaborating closely with the research community, we can better serve their needs with a purpose built turnkey solution. Our focus on the genomics sector is critical. There is simply too much that needs to be done, and not enough of a collective effort to drive the development of a proper solution to address the future market need. It’s more of a long term strategy, but I believe the efforts we put in to making this a reality will pay off in the end.
insideHPC: How long has Tycrid been shipping systems — and who are some of your customers?
Heier: We started shipping systems earlier this year. We have about a dozen systems installed at some leading research and academic institutions, but at this time, we are not at liberty to discuss the applications they have been working on. I can say that throughout the next year, we will begin the development of a truly revolutionary platform that will be available on the CANARIE research network. There are also several very exciting collaborations we will be entering into for applications porting and algorithm development.
insideHPC: So what is the next big thing for Tycrid?
Heier: We have quite a few activities and milestones coming up this next year so I think I can confidently say you will be hearing quite a bit about us in 2010. I’m very excited about our founding role in the Prometheus Alliance which was just announced this past week. The Alliance is something I truly feel will evolve into something else. Seriously. I’m a young guy, and being able to spearhead an alliance as important as I believe Prometheus will be is something I will always look back on with pride. It is something that has to happen, and now is the right time to make it happen. There are just so many great things happening in genomics that will affect all of our lives for the better, and the alliance will be the vehicle to drive the innovation needed to make these things happen.
Convey announces high profile customers, two dozen units shipped
The last time we spoke to the folks at Convey, they were grinning ear to ear over the $24 million in Series B funding that they had successfully brought in house. According to Convey CEO Bruce Toal, one of the first orders of business with the influx in funding was to tool up and begin shipping production units to customers.
This was not what we had expected to hear at the time. Convey officially unveiled their technology at SC08 to quiet fanfare. At this stage in their lifetime, typical startups would be heavily focused on developing internal engineering talent, marketing collateral, and fine tuning their product features. But then, Convey is not the typical tech startup.
First customers: how about Stanford, LBNL, and ORNL?
Fast on the heels of several exciting press releases today, Convey pulled us aside and gave us the low-down on several customer deliveries of production units. We figured we would hear some reasonable but small customer success stories, as one usually does from a startup at this stage. Again, not Convey.
These folks are coming out swinging. Convey told us today that they have added Stanford, Lawrence Berkeley National Laboratory and Oak Ridge National Laboratory as HC-1 customers. “These high-profile customer installations underscore the traction that Convey is gaining in the high-performance computing market,” said Bruce Toal, CEO and president of Convey Computer Corp. “The decision by three of the world‘s most recognized scientific think-tanks to utilize Convey‘s HC-1 computers as they undertake some of today‘s most advanced research projects is exciting and gratifying for our young company and supports our hybrid-core technology.”
Stanford Center of Computational Earth and Environmental Science
First out of the gate is the Stanford Center of Computational Earth and Environmental Science (CEES). The researchers at CEES are planning to use their HC-1 to develop new seismic-imaging and reservoir simulation algorithms. The research is a part of a new consortium called the Stanford Earth Sciences Algorithms and Architectures Initiative. Among the main goals for the research, the group will be evaluating modern HPC architectures for applied Earth Sciences algorithms.
“High performance computing has entered a period of rapid change that brings opportunities for huge performance gains,” said Dr. Biondo Biondi, co-director of the Stanford Exploration Project and one of the Initiative‘s principal investigators. “Convey‘s hybrid-core computing shows promise of achieving impressive performance using high-level programming languages and standard programming environment. We are looking forward to working with and testing this innovative system.”
Lawrence Berkeley National Lab
Lawrence Berkeley National Lab will utilize the HC-1 in simulating new computer architectures and approaches to developing more energy-efficient systems. Despite their “energy” charter, the DOE has also been deeply involved in many aspects of next-generation climate science. Much of the architecture research performed on the HC-1 will be specifically geared towards climate applications at unprecedented resolutions. The LBNL researchers are also studying how the HC-1 can assist in solving bioinformatics workloads such as graph algorithms for gene cluster analyses.
“Energy efficiency has become a first-order design constraint for future systems. We really don‘t see the current path of scaling up conventional hardware as sustainable either in terms of the initial hardware cost or the price of powering such systems over its lifetime,” said Dr. John Shalf, head of Berkeley Lab‘s Science-Driven Systems Architecture team. “The HC-1 presents an
intriguing alternative approach to achieving energy-efficient computing using an architecture that can adapt to the requirements of the science problem. We are looking forward to getting our hands on the system to assess all aspects of its scientific computing capability.”
Oak Ridge National Laboratory
The final, late-breaking, customer announcement was Oak Ridge National Laboratory. ORNL has emerged as a real user-land superpower in the HPC universe. They are planning to utilize the HC-1 for a variety of mission-critical programs. Given the scope of ORNL’s research, the workloads will span the full gambit of science, including: nuclear energy, climate modeling, national security and infrastructure. This is quite a win for Convey. Not only does ORNL contain specific scientific discipline champions, but they have the computational architecture talent to squeeze every ounce of performance out of any given platform. They are great folks to have banging on your system.
“We chose the HC-1 as a lead development platform for many of the elements expected to take us into the next decade in focused performance, power-efficient systems and productivity of proposed future systems. The team backing the HC-1 has a proven track record in innovation and bringing ‘ease of use to the broader HPC community. The system is designed to have a very
modular suite of reconfigurable components allowing the HC-1 system(s) to act as specialized components of an overall larger design. We will be able to evaluate new algorithms, optimize old algorithms and design new systems and architectures from the first principles point of view. The HC-1 will be an integrated part of the newly formed Hybrid Multi-Core Consortium,” said Dr. Jeffrey Nichols, associate laboratory director for Computing and Computational Sciences at Oak Ridge National Laboratory.
(Roughly) 25 machines shipped. What’s next?
According to Toal, the company is going strong following the Series B funding. They’ve shipped roughly 25 HC-1 machines to around 10 different customers. He specifically noted that several of their customers asked to remain anonymous due to the significant speedups they were achieving in their respective areas of expertise.
The Series B has assisted them in beginning to develop a global sales and service footprint. The Richardson, Texas office is bustling with the 48 employees (up from 35 the last time we spoke) that now make up the operations at Convey. They have officially deployed HC-1 Personalities in life sciences, speech recognition, electrical design automation and financial services, with several more planned for early 2010.
The future? The Toal-Brewer-Wallach tri-fecta is tight lipped about any further customer deliveries and future architecture plans. However they did in two years what the industry at large has failed to do in a decade: deliver a pure hybrid computing platform. Definitely a company to watch.
Based on InfiniScale IV, Mellanox’s 4th generation of InfiniBand switch silicon, the IS5000 switch system family delivers the highest networking bandwidth per port to enable the next generation of high-performance computing, cloud infrastructures and enterprise data centers. The new switch solutions reduces network congestion and the number of network cables by a factor of three, providing customers with the optimal combination of cost-effective, proven performance and efficiency enhancements to address next-generation, Petascale computing demands.


