Q&A with SiCortex co-founder Matt Reilly on what happened, and what it means for HPC

The SiCortex story seems to be on everyone’s mind right now; in particular many people I’ve talked with are concerned that the failure of this company along with so many others may mean that the HPC ecosystem is headed into a time when it simply cannot sustain a broadly diverse group of vendors.

SiCortex co-founder Matt ReillyWe caught up with one of the founders of the company, Matt Reilly, and asked him to share his thoughts about what went right, what went wrong, and what it all means. Matt’s responses are perceptive, and powerful in their honesty. You’ll want to read all the way to the end, but here’s a takeaway: innovative companies can still compete in a market dominated by big players, but it takes strong execution.

insideHPC: Matt, thanks for agreeing to talk with us and share your perspective on what happened at SiCortex. You were one of the original three founders of the company (along with John Mucci and Jud Leonard) and so your take on things will be especially valuable. Do you want to lay out any ground rules for your answers?

Matt Reilly: I can only speak from my own perspective, it is important to note that I left the company at the beginning of the year, so most of the information I have is from prior to that time. And, of course, I have a peculiar perspective as one of the three founders that may not be right. For that matter, I’m sure Jud and John have completely different perspectives.

insideHPC: On your blog you wrote:

” SiCortex failed for the same reasons lots of businesses fail: they ran out of money. The reasons had nothing to do with the product concept and everything to do with the execution and timing: May of 2009 is not a good time to be raising money: success takes a mix of luck, skill, and determination that just didn’t come together this time around.”

Was money the driving determining factor in SiCortex’s ultimate failure as a business? It seems a terrible shame that the small sum of $30M is all that stood between you and a viable business. If funding were to arrive before June 30, could/would SiCortex continue, or is it past that point?

Matt: As to the June 30 timeframe, I can’t say. I can say that there are several of us who know what we would do with that kind of funding (or lower amounts as well).

As for a viable business, I don’t have enough information to say whether $30M would do it. A lot depends on how the company is managed. We built the first prototype machines with the A round funding. That was substantially less than $30M.

And $30M is not a small sum. We get inured to all this, but $30M is still a lot of money. Getting investors to part with $30M at any time takes hard work, exhaustive understanding of the market and the product, and a dogged determination.

insideHPC: Was the suspension of operations a surprise to SiCortex’s employees?

Matt: I think many folks knew that the burn rate vs. the cash flow meant that a new investor had to come in by the end of May. That has been clear for a long time. We had all hoped that either the current investors or a new investor would step up. (Though I have no quarrel with the investor’s decision: business is business and the VCs are basically spending other people’s money.)

insideHPC: There have been some great discussions on the internet around SiCortex, its role in the ecosystem, and what its demise means for the future of what I’ll call “independent technologies” — outside of the Intel/AMD/Ethernet zone — in our community. Woven has now failed, and Quadrics has failed as well. Is there no place for no commodity compute platforms in supercomputing? If no, what needs to change (acquisition, funding, etc.) in order to change this?

Matt: I wouldn’t take SiCortex and Quadrics as proof that you can’t compete against Intel/AMD/whoever. There is a place in the ecosystem for some biodiversity. For all the time I spent on the road, I rarely saw evidence that a non-x86 implementation was a real problem. Granted, SiCortex chose market segments that tended to be non-ISV centric, but even the x86 ISVs will port to a platform if they’re paid for it or if they see a promising market.

But competing against a dominant model (even if it is broken or suboptimal) requires luck, skill, and resources. There isn’t much room for operational mistakes, and you better have a hit with your first product, because the big guys may be slow, but they aren’t stupid. Companies like SiCortex aren’t a threat to the big guys, but they can become an irritant. It is inevitable that BigCo will crank up the marketing machine in response to any irritation. They may even do something about their products. The newcomer needs to make the first product a hit — second chances are very expensive.

And let’s watch that use of the word “commodity.” It implies that x86 processors are a commodity. This is a triumph of marketing. How many sources are there for general purpose x86 processors? Commodity should mean that anyone canmake it.

The real commodities in the computing market are DRAMs and wafers. SiCortex had 50% gross margins. That’s not too bad given the competition. Had it achieved 2% market penetration in high performance technical computing, the cost of engineering the product would have been “in the noise” and so the business would have come down to the cost of wafers at TSMC (pretty low: TSMC is a manufacturing wonder) and the cost of DRAMs (where everybody pays the same price). So this wasn’t SiCortex vs. commodity, it was SiCortex vs. PCs or CoMmodity vs. commodity.

insideHPC: Some analysis has pointed out that although SiCortex hardware was very well integrated and had a lot of convenience features for system owners, memory bandwidth was a problem relative to mainstream processors (less than 1/10th of Nehalem today, though the Nehalem comparison isn’t quite fair since your Gen 2 product would have improved this; the 5400 would be a better comparison). This analysis points out that the system really seemed suited applications that were small message limited, or for those who were infrastructure constrained. Also, the advent of QDR IB erased the communications advantage in your gen 1 product. All of this adds up, in that line of commenting, to an observation that it was really that the design cycle for you guys was longer than x86 processor and IB interconnects, and that this is what ultimately made you not viable. How do you respond to this?

Matt: Last part first. There is no way in hell that the design cycle for a SiCortex product was even remotely as long as an x86. When I left Intel in 2002 they were talking about Nehalem. Now that chip went through several project changes under the same name, but the design cycle was much longer than anything SiCortex did. The first SiCortex chip taped out 21 months after we hired the initial hardware team. We were working on getting that down to 18 months forsubsequent projects.

The low memory bandwidth was the number one technical problem with the first generation machine. And it was my fault. As the technical project leader, I made a decision to get the chip out the door rather than fix what we knew to be a problem. At 385 MB/s, the Stream TRIAD score was about half what it should have been. In retrospect, we should have fixed that.

That said, there are a number of applications that didn’t seem to mind the low memory bandwidth. They tended to be communications dominated, but not necessarily short message. (I think too much is made of zero-length message performance. Yes, it is a component of the problem, but for many applications the important performance is around 1KB to 16KB messages.) The machine was an absolute demon at sorting (10 Billion 100B records in 19.7 seconds) and very very fast at 3D FFT. It did well on Kirchoff migration. It was very good at seismic tomography. It is the fastest GUPS engine per pound, per cubic foot, per megadollar on the planet. It runs shmem like a dream. So, how did SiCortex fail again? I’m confused….

insideHPC: SiCortex spent a lot of time communicating a message around being a green alternative, but several comments have been made to me to the effect that there was no convincing value proposition on a performance- per-watt basis compared to other alternatives because of the simplicity of the MIPS instruction set and lower clock frequency. Was the green message the right one for SiCortex? Looking back do you see more powerful or convincing messages that might have been more effective (I know, Monday morning quarterbacking).

Matt: Personally, I think the green thing took on a life of its own. Frankly, I’m a little irritated that almost all the press was about “SiCortex the green alternative” without mentioning “SiCortex, the folks who got communication right” or “SiCortex, the folks who put 1,458 processors in the hands of graduate students.” But green makes a better t-shirt and is a much simpler and more fashionable message. The executive team apparently felt that pushing the green message was important to the company profile and investor interest.

insideHPC: Big picture question: Do you think HPC can sustain future growth in the HPC vendor/provider pool? What does the SiCortex arc have to say about the future ofnew vendors in the community?

Matt: The SiCortex arc says a few things, I think:

  1. Get it right the first time. The initial SiCortex implementation worked, was reliable, and did all that it was designed to do. But the memory bandwidth was lacking because of an architectural/ management/ schedule/ funding tradeoff that was done wrong. So, lesson 1 is “do everything right.”
  2. Get the second generation out quickly. SiCortex spent a lot of time after gen 1 (the chip was called “Ice9”) tweaking and twiddling. The company should have started on the second generation immediately with a very small team. It didn’t, we spent those resources elsewhere, perhaps to good effect, but the delay in the second generation was a problem.
  3. When I started, I assumed that contracting out parts of the effort were a bad idea: nobody is as interested in my success as I am. I was wrong. There are many really good sources of help in the industry. The key is to find the providers who have as strong an interest in pleasing their customer (you) as you have in producing a working device. We got absolutely great IP help from a number of companies both big and small. I also have to say that the folks at Cadence were spectacular.
  4. On the other hand, I believe that management’s decision to outsource the C round fundraising to a Wall Street firm was a mistake. The third party’s stake in the success of the effort was miniscule, and they knew almost nothing about the computing market in general, never mind the high performance market. They failed miserably, having raised not a single dime, and they wasted valuable time.

Outsourcing the fundraising ignored the key lesson from the early days of
SiCortex: Entrepreneurial ventures are fundamentally romantic exercises in optimism. If the entire venture is decided on the outcome of an Excel spreadsheet, you’re probably screwed. The number of opportunities for success is small, the number of paths to failure are infinite.

So the decision to invest has to be driven to some degree by visceral, emotional, human response. That means getting people excited about a vision for the company and about creating a new place in the market. It is about doing something and saying something different. That gets completely lost when we turn it over to B-school suits and paper salesmen.

When Jud Leonard, John Mucci and I were going around to VCs, John drilled this into us. I didn’t get it at first — I’d arrive at each meeting in a coat and tie. After the third meeting, John took me aside and said: “chuck the coat and tie, they’re paying for a geek, you need to show them a geek.” And he was right. Yes I could dress like a grown-up, but what we needed to project was the deeper truth — that Jud and John and I were excited about creating this new business and we had the energy and enthusiasm and optimism to carry it out.

insideHPC: The business might not have worked out, but it seems that SiCortex’s presence had a tremendous impact on the community. What is your sense of where SiCortex had what may turn out to be lasting impact?

Matt: I would hope that the greatest impact would be to encourage other folks to assemble a great team, some funding, and do something that challenges the computer industry’s assumptions.

I suspect that the longest lasting impact (in the computer industry, that could be days or even weeks!) will be in the power efficiency discussion. The GCPI (green computing performance index) is a straw horse proposal that has its problems, but in general points in the right direction: performance vs. power should be expressible both as a vector (a set of benchmark figures vs. power consumed) and a scalar (the weighted/normalized sum of the vector components). Hmmm… that’s probably too complex a statement to have any lasting impact. We’ll have to figure out how to reduce it to a t-shirt slogan…

What I originally wanted the long term impact to be was quite different from either of these goals. The really exciting thing about the SiCortex approach was that the architecture put very large processor-count clusters in the hands of folks who couldn’t afford 1000 cores of some duct-tape PC configuration along with a reasonably efficient interconnect. Ethernet clusters run out of steam at pretty low processor counts, and the fancier interconnects just cost too much.

Comments

  1. Jim Tuccillo says:

    I think you may have quoted a few of my comments from LinkedIn. You make a good point about comparing Intel 5400 series processors instead of 5500 series (aka Nehalem). In that case the difference is only about 4x on the per core STREAM number assuming you have the right chip set – there are some slower Intel chip sets. The SiCortex clock crank may have shrunk that number. Clearly, stride-1 access isn’t the only issue but I don’t have a feel for the memory latency so I can’t comment on that. The issue with design cycles really comes down to the fact that Intel/AMD seem to have something significant to release every 2 years. The same time frame applies, by and large, to IB fabrics, perhaps 3 years for DDR to QDR?

  2. Jim – yes, you and several people made great observations on that LinkedIn thread I started that were most helpful in gauging the general sentiment about SiCortex. I’m very grateful for your insight.

Resource Links: