Reproducible Science

Reproducibility is a major concern in modern science, and there is somewhat of a crisis at the moment following several high-profile reports indicating that most studies are likely false and that many published results cannot be reproduced. There are many reasons for this crisis. Chief among these are:

The use of small samples with low statistical power, which can increase the chance of false findings for several reasons (discussed in the material below)
p-hacking; i.e., running every possible analysis to find a result with p<.05, and then reporting that result. Naturally this results in a massive multiple comparisons problem and means that the ‘significant’ results is likely false
distorted incentives for scientists; i.e., the publish or perish culture; emphasis of prestige journals on nice, clean stories; reduce emphasis on methodological detail in these journals and so on.

In response to these trends there is a growing movement to promote reproducible science, and there are some simple things that you can do to ensure your results are robust.

Independent validation. There is now a large number of open MRI datasets that can allow you to replicate (most) results in an independent sample. It is worth thinking about incorporating these datasets as replication sets in your own work. Examples include:
- 1000 connectomes project/INDI: http://fcon_1000.projects.nitrc.org
- Human connectome project: http://www.humanconnectome.org
- Open fMRI: https://openfmri.org
- Oasis: http://www.oasis-brains.org
- ADNI: http://adni.loni.usc.edu
- UK biobank: http://www.ukbiobank.ac.uk
- NYU paediatric MRI biobank: The healthy brain network, http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/
- Australian Schizophrenia Research Bank: http://www.schizophreniaresearch.org.au/bank/
- An online tool for searching publicly available datasets: http://openneu.ro/metasearch/
- A broader list of open datasets: https://github.com/cMadan/openMorph
- Developmental datasets: see table 1 here
If you use exploratory data analysis, replicate the key finding in an independent sample, or use test-train cross-validation procedures as typically used in machine learning.
Get big samples.

There are many steps that should be taken to ensure that your science is reproducible. The problem can be summed up in the following quote:

"An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.” — David Donaho

In general, you can use open code repositories such as GitHub (https://github.com) to share your code and make sure the data are accessible. It is also possible to share data using servers such as Figshare (https://figshare.com), but make sure you have ethics approval to do so first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducible Science

More problems, some solutions

Clone this wiki locally