Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Video: 1 Year Later, the HPE Spaceborne Supercomputer Keeps on Trucking

Mark Fernandez, HPE

In this video from the HPC User Forum in Detroit, Mark Fernandez from HPE presents: Spaceborne Computer Team, Overview, and Update.

It’s been almost 12 months since HPE and NASA sent the first ever commercial off-the-shelf (COTS) computer system — the Spaceborne Computer — to the International Space Station (ISS). As part of an experiment to operate seamlessly in the harsh conditions of space for one year – about the amount of time it will take to travel to Mars — Spaceborne is literally going faster and further than any other COTS supercomputer has been before.

After more than 11 months in space — roughly 5,000 orbits and 250 miles above Earth — the system has operated virtually problem-free on the ISS since it was installed and powered on in September 2017. Not only has Spaceborne succeeded in becoming the first COTS — over traditionally hardened computers that are more than 10 years old — to run one teraFLOP (over one trillion calculations per second) onboard the ISS, it has done so in zero gravity, with unique power and cooling conditions, and subjected to unpredictable levels of radiation.

It has laid the groundwork for performing compute-intensive experiments without aid from Earth, which will be necessary to advance space exploration on journeys millions of miles away from our home planet. More importantly, this experiment enables us to apply our learnings to advance earth-bound technologies and further increase their reliability and robustness. That’s good news for us, and our partner NASA, as it gives us confidence that such a system not only changes the way we compute in space, but effectively defies the odds and computes faster, at higher capacities and for longer than anyone expected.

Upon its planned return to Earth later this year, Spaceborne will endure a battery of standard product failure analysis (PFA) tests with our parts suppliers. In the meantime, however, we can safely state that our theory of taking a COTS system — using a different approach of “hardening” with software as opposed to physical hardening that requires more resources — to maintain consistent, high-level performance, has proven to be true.

While immensely successful, the mission doesn’t come without a few important lessons and reminders along the way. Here are a few of note:

  • Designing software for Earth vs. space. Writing software is like any other form of communication. It has a language and guidelines, but it can be subjective. When writing software on Earth, we make certain assumptions that factor into the code — e.g. network connectivity will be rock solid with only occasional, minor interruptions. Not so for space as it’s the exact opposite. In general, you don’t have a solid network. It is best to design as if you have only occasional connectivity. Loss of signal (LOS) is much more frequent. Assuming a consistent AOS (acquisition of signal) was an Earthly bias that crept into our software design. In the future, we plan to design our space-bound (or remote) software stack differently to account for the much more frequent network anomalies.
  • Not all astronauts are IT experts. We’re used to writing instructions for customer replaceable units (CRUs) to enable IT savvy customers to be able to resolve issues by using a provided replacement part. Astronauts are experts in a lot of things, but IT isn’t always one, especially when working in zero gravity. These CRU guidelines are woefully inadequate to hand to astronauts. In a fairly extensive process, we developed detailed instructions for customers that aren’t trained in IT and tailored them for the space environment, instead of standard Earthly data center conditions. We’ve dubbed these astronaut replaceable units (ARUs)!
  • Unpredictable dynamic radiation environments of space. In traditional hardening, engineers would anticipate a variety of different radiation fields and possible events, spend lots of time and money trying to design protection for it. Since we can’t expect what the radiation environment in space will look like minute by minute, we’ve taken an upside-down approach to monitor all of the components. If we suspect a component is out of parameters, we hunker down into a safe mode. We stay in that safe idle configuration to make it through that time period. Once that event has passed, we execute a health check to ensure everything is performing well before resuming operation.

With Spaceborne having run exceptionally well for almost a year in conditions far different from those on Earth, we wonder, what can we bring to space next? Private cloud on the ISS to bring intelligence to the edge in space? With considerations for multiple launches and landings from Earth to Moon to Mars, this opens a door to an entirely new set of challenges like weight and cost to even longer longevity and higher performance from a computing system.

In the meantime, we need to continue exploring use of the latest COTS supercomputers and work with our component manufacturers to improve durability without adding weight. Only time will tell where we will go from here. But for now, we await the homecoming of our beloved Spaceborne Computer so we can learn more from it and prepare for future innovation. We remain in first place in a new kind of space race and set the record each and every day.

The next HPC User Forum takes place Oct. 1-2, 2018 in Stuttgart, Germany.

Check out our insideHPC Events Calendar

See more talks in the HPC User Forum Video Gallery

Leave a Comment

*

Resource Links: