Measures of clinical significance - summary of an article by Kraemer et al. (2003)
Measures of clinical significance.
Kraemer, Morgan, Leech, Gliner, Vaske & Harmon (2003)
Journal of the American Academic of Child & Adolescent Psychiatry
Introduction
Behavioural scientists are interested in answering three questions when examining the relationships between variables
- Statistical significance
Is an observed result real or should it be attributed to chance? - Effect size
If the result is real, how large is it? - Clinical or practical significance
Is the result large enough to be meaningful and useful?
Researchers suggest that using one of three type of effect size measures assist in interpreting clinical significance
- r family effect size measures
The strength of association between variables - d family effect size measures
The magnitude of the difference between treatment and comparison groups - Measures of risk potency
- Odds ratio
- Risk ratio
- relative risk reduction
- risk difference
- number needed to treat
Problems with statistical significance
A statistical significant outcome indicates that there is likely to be at least one relationship between the variables. p indicates the probability that an outcome this extreme could happen, if the null hypothesis is true. It doesn’t provide information about the strength of the relationship or whether it is meaningful.
It is possible, with a large sample, to have a statistically significant result from a weak relationship between variables. Outcomes with lower p values are sometimes misinterpret as having stronger effects than those with higher p’s.
Non-statistically significant results do not ‘prove’ the null hypothesis. These might be due to determinants of low power.
The presence or absence of statistical significance does not give information about the size or importance of the outcome. This makes it critical to know the effect size.
Effect size measures
The r family
One method of expressing effect sizes is in terms of strength of association. This can be done with statistics such as the Pearson product moment correlation coefficient, r, used when both the independent and the dependent measures are ordered. Such effect sizes vary between -1.0 and + 1.0. 0 represents no effect.
The d family
Used when the independent variable is binary (dichotomous) and the dependent variable is ordered.
When comparing two groups, the effect size d can be computed by subtracting the mean of the second group from the mean of the first group an dividing by the pooled standard deviation of both groups.
d expresses the mean difference in standard deviation units. It ranges from minus infinity to plus infinity, and 0 indicates no effect.
Measures of risk potency
This is used when both the independent and dependent variable are binary.
Five common ones
- Odds ratios
- Risk ratio
- Relative risk reduction
- Risk difference
- Number needed to treat
Odds relations and risk ratios vary from 0 to infinity, with 1 indication no effect . Relative risk reduction and risk difference range from -1 to 1, with zero indication no effect . NNT ranges from 1 to plus infinity, with very large values indication no treatment effect.
AUC
When the independent variable is binary but the dependent variable can be either binary or ordered. It ranges from 0% to 100%, with 50% indication no effect.
Issues about effect size measures
There is little agreement about which effect size to use for each situation.
Effect sizes should always be reported for primary results.
Interpreting d and r effect sizes
The d and r effect sizes are relatively abstract and consequently may not be meaningful to patients and clinicians. They were not intended to be indexes of clinical significance and are not interpretable in terms of how much individuals are affected by treatment.
Clinical significance
The clinical significance of a treatment is based on external standards provided by clinicians, patients and/or researchers. There is little consensus about the criteria for these efficacy standards.
Clinical significance is a change to normal functioning due to therapy.
Interpreting measures of risk potency
Clinicians must make categorical decisions about whether or not to use a treatment and the outcomes are often binary.
The phi coefficient, which applies the formula for the Pearson or Spearman correlation coefficient to 2 x2 data, is sometimes used. The more different the distributions of the two binary variables, the more restricted the range of phi. It is difficult, if not impossible, to extract clinical meaning from phi.
All the measures below are used when researchers and clinicians have a 2x2 contingency table to express the risk of clinical level outcomes. In some cases, such table results when initially continuous outcome data are dichotomized. This results in a loss of information and can also result in inconsistent and arbitrary effect size indexes due to different choices of the cut-point or threshold for failure, whatever effect size is used.
Odds ratio
The odds ratio is determined by first computing the odds, the ratio of the percentage judged to fail (failure rate) to the percentage judged as successes (success rate) within both the comparison and intervention groups. The odds ratio is then obtained by dividing the comparison group odds of failure to those of the intervention group.
A limitation of the odds ratio as an effect size index is that the magnitude of the odds ratio may approach infinity if the outcome is rare or very common, even when the association is near random. The magnitude of the odds ratio varies strongly with the choice of cut-point. They can be misleading.
Risk ratio
Risk ratio can be determined by dividing the failure rate of the comparison group by the failure rate of the treatment group. Or by dividing the success rate of the treatment group by that of the comparison group.
Risk ratios are always less than odds ratios. The odds ratio is the product of two risk ratios.
The choice of cut-point and which risk ratio is chosen change the magnitude of the risk ratio, making it hard to interpret. Because the risk ratio may approach infinity when the risk in the denominator approaches zero, there can be no agreed-upon standards for accessing the magnitude of risk ratio.
Relative risk reduction
Relative risk reduction is computed by subtracting the treatment group failure rate from the comparison group failure rate, and dividing the latter. Or by subtracting the comparison group success rate form the treatment group success rate and dividing the former. It can vary between 0 and 1.0.
Because ‘failure’ relative risk reduction may be very small when the ‘success’ relative risk reduction is large, relative risk reduction is difficult to interpret in terms of clinical significance. There are no agreed-upon standards for judging its magnitude.
Risk difference
Risk difference, is computed by subtracting the percentage of failures in the treatment group form the percentage in the comparison group. Or by subtracting the percentage of successes in the comparison group from that in the treatment group.
The risk difference varies from 0% to 100%. When the success or failure rates are extreme, the risk difference is likely 0%. The risk difference is often very near zero when the odds ratio and one of the risk ratios are very large, while the other risk ratio will be near 1, and the risk difference will be near zero, which indicates nearly random association.
NNT
NNT is the number of patients who must be treated to generate one more success or one less failure than would have resulted had all persons been given the comparison treatment. A result of 1.0 means the treatment is perfect. The larger the NNT, the less effective the treatment relative to the comparison.
AUC
AUC represents the probability that a randomly selected subject in the treatment group has a better result than on in the compassion group.
In situations where d might be used, AUC = φ (d/√2). φ is the cumulative standard normal distribution function. d/√2 is the z value and φ (d/√2) is the area under the normal curve up to that z value.
In situations where measures of risk potency are used, risk difference = 1/NNT = 2 AUC -1.
AUC can be computed based on clinical judgments alone. It has a special appeal as a measure of clinical significance.
AUC helps us understand the problem of cut-points. When one imposes dichotomization on a ordered response, there is a tacit declaration that all variation of response above the cut-point and all below the cut-point have no clinical relevance.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Concept of JoHo WorldSupporter
JoHo WorldSupporter mission and vision:
- JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant tolerant and sustainable world. Through physical and online platforms, it support personal development and promote international cooperation is encouraged.
JoHo concept:
- As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
- JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.
Join JoHo WorldSupporter!
for a modest and sustainable investment in yourself, and a valued contribution to what JoHo stands for
- 1520 keer gelezen
Evidence-based working in clincial practice
- 2693 keer gelezen
Evidence-based working in clincial practice
- Evidence-based treatment and practice: New opportunities to bridge clinical research and practice, enhance the knowledge base, and improve patient care - summary of an article in American psychologist
- A power primer - summary of an article by Cohen (1992)
- Measures of clinical significance - summary of an article by Kraemer et al. (2003)
- Analysis of covariance - summary of chapter 13 of Statistics by A. Field (5th edition)
- Mixed designs - summary of chapter 16 of Statistics by A. Field (5th edition)
- Categorical outcomes: logistic regression - summary of (part of) chapter 20 of Statistics by A. Field
- Moderation, mediation, and multi-category predictors - summary of chapter 11 of Statistics by A. Field (5th edition),
- Voorbij het oordeel van de dodo - samenvatting van een artikel van Huiberts (2015)
- An introduction to Meta-analysis - summary of chapter 1, 2, 3, 4, 8, 10, 11, 12, 13 and 20
- Meta-analysis in mental health reserach - summary of part of an article by Cuipers (2016)
- Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyse univariate data - summary of an article by : Marija, de Haan, Hogendoorn, Wolters and Huizenga
- N=1 studies in onderzoek en praktijk - samenvatting van een artikel uit De psycholoog
- The empirical status of empirically supported psychotherapies: Assumptions, findings, and reporting in controlled clinical trials - summary of an article in Psychological bulletin
- Evidence-based working in clinical practice - uva
Work for JoHo WorldSupporter?
Volunteering: WorldSupporter moderators and Summary Supporters
Volunteering: Share your summaries or study notes
Student jobs: Part-time work as study assistant in Leiden

Contributions: posts
Evidence-based working in clincial practice
In this bundle, summaries of the articles and other reading material that are useful for knowing when a treatment is evidence-based are bundled.
The first two chapters of this summary are for free, but to support worldsupporter and Joho, you have to become a Joho-
...- Lees verder over Evidence-based working in clincial practice
- 2693 keer gelezen
Search only via club, country, goal, study, topic or sector











Add new contribution