Here are things I have heard working scientists say:
- I made the plot in excel
- I played with the plot in excel until it looked good
- I’m not sure how I got from the raw data to the plot
- I’m not sure which data collection run gave me that data file
- my collaborator sent me that data file in an email attachment; I fixed a problem and sent it back to him in an email attachment
I have the impression that a lot of papers I look at have results that are not reproducible. If the papers made a claim to a major discovery with great practical consequence, they would be examined very closely and might shown to be unreproducible.
The Wikipedia article on Cold Fusion has an instructive and sadly amusing blow-by-blow description of the circus that followed the 1988 announcement that Fleischmann and Pons had found the solution to the world’s energy problems with cold fusion. This was not a typical boring and pointless paper, so people all over the world tried to reproduce their experiment, and it turns out there was nothing there, although interestingly some groups proved that wishful thinking can be quite powerful.
Much has been written about Fleischmann and Pons, but I have not found if their problem was lack of reproducibility (so they found a plot that looked good and just focused on that) or if there was something darker at work. (Does anyone know the inner story on this?)
But what is quite clear is that if your bit of research is important then it will be scrutinized and you had better have a clear reproducible trail leading from the experiment (or simulation), through the various stages of calibration adjustment and analysis, up to the plots in the paper. If you do not have an automatic way of doing all that, then you will be embarrassed by this scrutiny. If your collected data cannot be automatically linked (probably via metadata) to the exact instrument configuration at the time of collection, then you do not really have a scientific result: you have something suggestive but not convincing.
A good example I have seen of this rigor for a large scale project is Andy Fraser’s book on Hidden Markov Models. You can download the book and what you get is a source code archive — you run the compilation scripts and they build the book, including running all the programs which generate the data for the plots, and then generate the plots themselves. Dependencies are tracked: if the time stamp on a file is changed, everything that requires that file’s information will be rebuilt. (Yes, it uses “make”.)
In an experimental setting this is even more important: raw data is taken once, but the information used to process that raw data might be updated, at which point the raw data has to be turned into finished data products automatically, without human intervention. This is often not done or left until later (i.e. never).
I think that the solution to this problem involves a cocktail of the following, which really mesh with each other:
- availability of raw data
- availability of all processing codes
- version control
- software pipelines
I plan to discuss how these ideas play into good reproducible science and how one should program to guarantee reproducibility.
… some vaguely related links:
Carlo Graziani’s article on Ed Fenimore and honesty in science (in particular his link on the Ginga lines)
Blas Cabrera’s possible detection of a magnetic monopole in 1982
The Atlantic Monthly’s article Lies, Damned Lies, and Medical Science