Measures of clinical significance - summary of an article by Kraemer et al. (2003)

Measures of clinical significance.
Kraemer, Morgan, Leech, Gliner, Vaske & Harmon (2003)
Journal of the American Academic of Child & Adolescent Psychiatry

Introduction
Problems with statistical significance
Effect size measures
Issues about effect size measures
Interpreting d and r effect sizes
Clinical significance
Interpreting measures of risk potency

Introduction

Behavioural scientists are interested in answering three questions when examining the relationships between variables

Statistical significance
Is an observed result real or should it be attributed to chance?
Effect size
If the result is real, how large is it?
Clinical or practical significance
Is the result large enough to be meaningful and useful?

Researchers suggest that using one of three type of effect size measures assist in interpreting clinical significance

r family effect size measures
The strength of association between variables
d family effect size measures
The magnitude of the difference between treatment and comparison groups
Measures of risk potency
- Odds ratio
- Risk ratio
- relative risk reduction
- risk difference
- number needed to treat

Problems with statistical significance

A statistical significant outcome indicates that there is likely to be at least one relationship between the variables. p indicates the probability that an outcome this extreme could happen, if the null hypothesis is true. It doesn’t provide information about the strength of the relationship or whether it is meaningful.

It is possible, with a large sample, to have a statistically significant result from a weak relationship between variables. Outcomes with lower p values are sometimes misinterpret as having stronger effects than those with higher p’s.

Non-statistically significant results do not ‘prove’ the null hypothesis. These might be due to determinants of low power.

The presence or absence of statistical significance does not give information about the size or importance of the outcome. This makes it critical to know the effect size.

Effect size measures

The r family

One method of expressing effect sizes is in terms of strength of association. This can be done with statistics such as the Pearson product moment correlation coefficient, r, used when both the independent and the dependent measures are ordered. Such effect sizes vary between -1.0 and + 1.0. 0 represents no effect.

The d family

Used when the independent variable is binary (dichotomous) and the dependent variable is ordered.

When comparing two groups, the effect size d can be computed by subtracting the mean of the second group from the mean of the first group an dividing by the pooled standard deviation of both groups.

d expresses the mean difference in standard deviation units. It ranges from minus infinity to plus infinity, and 0 indicates no effect.

Measures of risk potency

This is used when both the independent and dependent variable are binary.
Five common ones

Odds ratios
Risk ratio
Relative risk reduction
Risk difference
Number needed to treat

Odds relations and risk ratios vary from 0 to infinity, with 1 indication no effect . Relative risk reduction and risk difference range from -1 to 1, with zero indication no effect . NNT ranges from 1 to plus infinity, with very large values indication no treatment effect.

AUC

When the independent variable is binary but the dependent variable can be either binary or ordered. It ranges from 0% to 100%, with 50% indication no effect.

Issues about effect size measures

There is little agreement about which effect size to use for each situation.

Effect sizes should always be reported for primary results.

Interpreting d and r effect sizes

The d and r effect sizes are relatively abstract and consequently may not be meaningful to patients and clinicians. They were not intended to be indexes of clinical significance and are not interpretable in terms of how much individuals are affected by treatment.

Clinical significance

The clinical significance of a treatment is based on external standards provided by clinicians, patients and/or researchers. There is little consensus about the criteria for these efficacy standards.

Clinical significance is a change to normal functioning due to therapy.

Interpreting measures of risk potency

Clinicians must make categorical decisions about whether or not to use a treatment and the outcomes are often binary.

The phi coefficient, which applies the formula for the Pearson or Spearman correlation coefficient to 2 x2 data, is sometimes used. The more different the distributions of the two binary variables, the more restricted the range of phi. It is difficult, if not impossible, to extract clinical meaning from phi.

All the measures below are used when researchers and clinicians have a 2x2 contingency table to express the risk of clinical level outcomes. In some cases, such table results when initially continuous outcome data are dichotomized. This results in a loss of information and can also result in inconsistent and arbitrary effect size indexes due to different choices of the cut-point or threshold for failure, whatever effect size is used.

Odds ratio

The odds ratio is determined by first computing the odds, the ratio of the percentage judged to fail (failure rate) to the percentage judged as successes (success rate) within both the comparison and intervention groups. The odds ratio is then obtained by dividing the comparison group odds of failure to those of the intervention group.

A limitation of the odds ratio as an effect size index is that the magnitude of the odds ratio may approach infinity if the outcome is rare or very common, even when the association is near random. The magnitude of the odds ratio varies strongly with the choice of cut-point. They can be misleading.

Risk ratio

Risk ratio can be determined by dividing the failure rate of the comparison group by the failure rate of the treatment group. Or by dividing the success rate of the treatment group by that of the comparison group.

Risk ratios are always less than odds ratios. The odds ratio is the product of two risk ratios.

The choice of cut-point and which risk ratio is chosen change the magnitude of the risk ratio, making it hard to interpret. Because the risk ratio may approach infinity when the risk in the denominator approaches zero, there can be no agreed-upon standards for accessing the magnitude of risk ratio.

Relative risk reduction

Relative risk reduction is computed by subtracting the treatment group failure rate from the comparison group failure rate, and dividing the latter. Or by subtracting the comparison group success rate form the treatment group success rate and dividing the former. It can vary between 0 and 1.0.

Because ‘failure’ relative risk reduction may be very small when the ‘success’ relative risk reduction is large, relative risk reduction is difficult to interpret in terms of clinical significance. There are no agreed-upon standards for judging its magnitude.

Risk difference

Risk difference, is computed by subtracting the percentage of failures in the treatment group form the percentage in the comparison group. Or by subtracting the percentage of successes in the comparison group from that in the treatment group.

The risk difference varies from 0% to 100%. When the success or failure rates are extreme, the risk difference is likely 0%. The risk difference is often very near zero when the odds ratio and one of the risk ratios are very large, while the other risk ratio will be near 1, and the risk difference will be near zero, which indicates nearly random association.

NNT

NNT is the number of patients who must be treated to generate one more success or one less failure than would have resulted had all persons been given the comparison treatment. A result of 1.0 means the treatment is perfect. The larger the NNT, the less effective the treatment relative to the comparison.

AUC

AUC represents the probability that a randomly selected subject in the treatment group has a better result than on in the compassion group.

In situations where d might be used, AUC = φ (d/√2). φ is the cumulative standard normal distribution function. d/√2 is the z value and φ (d/√2) is the area under the normal curve up to that z value.

In situations where measures of risk potency are used, risk difference = 1/NNT = 2 AUC -1.

AUC can be computed based on clinical judgments alone. It has a special appeal as a measure of clinical significance.

AUC helps us understand the problem of cut-points. When one imposes dichotomization on a ordered response, there is a tacit declaration that all variation of response above the cut-point and all below the cut-point have no clinical relevance.

Access:

Public

Verzekeren bij een faire en solidaire zorgverzekeraar?

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant tolerant and sustainable world. Through physical and online platforms, it support personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.