Good Science is Repeatable - The Recomputation Manifesto

Good Science is Repeatable – The Recomputation Manifesto

July 15, 2013 by Doug Black

Over at the Software Sustainability Institute, Ian Gent from the University of St Andrews writes that computational experiments should be recomputable for all time.

Although every scientific primer says that replication of scientific experiments is key, to quote this tweet, you’ll need luck if you wish to replicate experiments in computational science. There has been significant pressure for scientists to make their code open, but this is not enough. Even if I hired the only postdoc who can get the code to work, she might have forgotten the exact details of how an experiment was run. Or she might not know about a critical dependency on an obsolete version of a library. The current state of experimental reproducibility in computer science is lamentable. The result is inevitable: experimental results enter the literature which are just wrong. I don’t mean that the results don’t generalise. I mean that an algorithm which was claimed to do something just does not do that thing: for example, if the original implementation was bugged and was in fact a different algorithm. I suspect this problem is common, and I know for certain that it has happened.

Recomputation Manifesto

Computational experiments should be recomputable for all time

Recomputation of recomputable experiments should be very easy

It should be easier to make experiments recomputable than not to

Tools and repositories can help recomputation become standard

The only way to ensure recomputability is to provide virtual machines

Runtime performance is a secondary issue

Read the Full Story or download the Full Manifesto (PDF).

Comments

Joel Malard says

July 16, 2013 at 8:42 am

I got the feeling that the authors came up with a solution first and then a hit list, but may be that is just me.

Point 1 is unavoidable, but then brick and mortar experiments never give exactly the same measurements whereas people often expect software to give the same numbers over and over up to 16 digits lest they declare it buggy.

Point 6 is kind of obvious if one remembers that the purpose of science is to find out the truth, esp. to weed out what doesn’t work from what does.

Point 2 makes no sense: simulations that takes years to run (QCD anyone?) have huge political and financial components that will never be easy let alone very easy.

Point 3, we all wish everything went from easy to easier but I don’t think this is a key issue.

Point 4: some tools and some repositories might help but the proof is in the pudding, e.g. some accounting practices make things easy other practices make it nearly impossible.

Point 5: well an alternative is to do like physicists and think statistically. That Terry Winograd’s code no longer runs is sad but how is that different from a single sighting of Bigfoot? There is actually value in replicating a computer simulation using several unrelated codes, that practice is like wearing a life vest when coding derivatives by hand.

All that said, the paper is definitely is a good start.

my 2c.
Michael Tobis says

August 1, 2013 at 1:33 pm

DIscussion of the topic in the context of climate models here:

http://scienceblogs.com/stoat/2013/07/29/repeatability-of-large-computations/

Good Science is Repeatable – The Recomputation Manifesto

Sponsored Guest Articles

Hammerspace Unveils the Fastest File System in the World for Training Enterprise AI Models at Scale

White Papers

Energy efficiency drives HPC to the cloud

Comments

Featured RSS Feed

More News from insideBIGDATA

Good Science is Repeatable – The Recomputation Manifesto

Sponsored Guest Articles

Hammerspace Unveils the Fastest File System in the World for Training Enterprise AI Models at Scale

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Comments

Related Posts

Featured RSS Feed

More News from insideBIGDATA