Skip to content

Reproducible Science

magnesium2400 edited this page May 8, 2025 · 1 revision

Reproducibility is a major concern in modern science, and there is somewhat of a crisis at the moment following several high-profile reports indicating that most studies are likely false and that many published results cannot be reproduced. There are many reasons for this crisis. Chief among these are:

  1. The use of small samples with low statistical power, which can increase the chance of false findings for several reasons (discussed in the material below)
  2. p-hacking; i.e., running every possible analysis to find a result with p<.05, and then reporting that result. Naturally this results in a massive multiple comparisons problem and means that the ‘significant’ results is likely false
  3. distorted incentives for scientists; i.e., the publish or perish culture; emphasis of prestige journals on nice, clean stories; reduce emphasis on methodological detail in these journals and so on.

In response to these trends there is a growing movement to promote reproducible science, and there are some simple things that you can do to ensure your results are robust.

  1. Independent validation. There is now a large number of open MRI datasets that can allow you to replicate (most) results in an independent sample. It is worth thinking about incorporating these datasets as replication sets in your own work. Examples include:

  2. If you use exploratory data analysis, replicate the key finding in an independent sample, or use test-train cross-validation procedures as typically used in machine learning.

  3. Get big samples.

There are many steps that should be taken to ensure that your science is reproducible. The problem can be summed up in the following quote:

"An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.” — David Donaho

In general, you can use open code repositories such as GitHub (https://github.com) to share your code and make sure the data are accessible. It is also possible to share data using servers such as Figshare (https://figshare.com), but make sure you have ethics approval to do so first.

More problems, some solutions

More details and some solutions to the problem are below:

The replication crisis

The problem of insufficient power

Problems with null hypothesis significance testing

Best practices for data sharing in neuroimaging

Clone this wiki locally