The statistical crisis (i.e. replication crisis) refers to the fact that statistical significance does not necessarily provide a strong signal in favour of scientific claims.
One major challenge for researchers is accepting that one’s claims can be spurious. Mistakes in statistical analyses and interpretation should be accepted and learned from rather than resisted. Criticism and replication are essential steps in the scientific process and one should not accept scientific claims at face value nor should they believe they are 100% true. Once an idea is integrated in the literature, it is very difficult to disprove it even though evidence supports the rebuttal attempts. Researchers should remain very critical of their own work and if possible, replicate their own studies.
When data analysis is selected after the data have been collected, p-values cannot be taken at face value. Published results should be examined in the context of their data, methods and theoretical support. Assessing the strength of evidence remains difficult for most researchers. Researchers might not have difficulty with finding statistically significant results that can be construed as being part of the general constellation of findings that are consistent with the theory because of the researcher degrees of freedom (e.g. choosing statistical analyses).
Statistical significant is less meaningful than originally thought because of the researcher degrees of freedom (1) and because statistically significant comparisons systematically overestimate effect sizes (2). The type M error refers to overestimating the effect size as a result of a statistically significant result.
There have been several ideas to resolve the replication crisis:
- Science communication
This entails not restricting publication to statistically significant results but also publication of replication attempts. Furthermore, there should be communication between disagreeing researchers and a detailed method section in order to replicate a study. - Design and data collection
This entails focusing on preregistration (1), design analysis using prior estimates of effect sizes (2), more attention to accurate measurement (3) and replication plans included in the original designs (4). - Data analysis
This includes making use of Bayesian inference (1), hierarchical modelling of outcomes (2), meta-analysis (3) and control of error rates (4).
The replication crisis appears to be the result of a flawed scientific paradigm rather than the result of a set of individual errors.
The problem of multiple comparisons should be taken into account and an a priori power analysis should be conducted in science. However, when the power analysis is based on previous effect size estimates, there should be caution as these effect sizes are thus often overestimated.
The issue of knowledge translation is a general issue in science, as it is not clear whether case studies can be generalized to the population and whether the observed effect in group studies is sufficiently large to have clinical implications for each individual of a specific group.
There are five major challenges in assessing experimental evidence within clinical neuropsychology:
- Clinical tests can include a wide range of outcomes
A test with a large number of dependent variables might have several significant results because of chance. This could influence interpretation and not necessarily lead to a valid conclusion as the non-significant results might be ignored. Focusing on a subset of outcomes should be guided by a strong prior theoretical argument. It is thus also important to distinguish between exploratory and confirmatory research. - Replicating findings in specific and sufficiently large patient groups
Replication is difficult in rare diseases and other specific groups. However, this can be aided by replicating a small part of another study in one’s study, so that researchers help each other. - Determining whether a finding is robust and should be included in clinical practice
It is difficult to determine whether a finding is robust (e.g. through replication) and whether this should be implemented in clinical practice. Determining whether a finding should be implemented in clinical practice could be done by performing a decision analysis, which specifies costs, risks and benefits. - Determining whether an earlier finding can be disregarded
It is difficult to determine whether there is enough evidence to disregard earlier conclusions that are widespread in clinical practice. - Translate group findings to individual patients
There is a risk over overstating what is learned from a study. It is difficult to determine whether group differences have clinical implications for the individual. Besides that, it is also not clear how an individual pattern of strengths and weaknesses is in line with research findings that apply to groups of patients.