Comparing several independent means - summary of chapter 12 of Statistics by A. Field (5th edition)

Statistics
Chapter 12
Comparing several independent means

Using a linear model to compare several means

ANOVA: analysis of variance
the same thing as the linear model or regression.

In designs in which the group sizes are unequal, it is important that the baseline category contains a large number of cases to ensure that the estimates of the b-values are reliable.

When we are predicting an outcome from group membership, predicted values from the model are the group means.
If the group means are meaningfully different, then using the group means should be an effective way to predict scores.

Predictioni = b0 + b1X + b2Y + Ɛi

Control = b0

Using dummy coding ins only one of many ways to code dummy variables.

  • an alternative is contrast coding: in which you code the dummy variables in such a way that the b-values represent differences between groups that you specifically hypothesized before collecting data.

The F-test is an overall test that doesn’t identify differences between specific means. But, the model parameters do.

Logic of the F-statistic

The F-statistic tests the overall fit of a linear model to a set of observed data.
F is the ratio of how good the model is compared to how bad it is.
When the model is based on group means, our predictions from the model are those means.

  • if the group means are the same then our ability to predict the observed data will be poor (F will be small)
  • if the means differ we will be able to better discriminate between cases from different groups (F will be large).

F tells us whether the group means are significantly different.

The same logic as for any linear model:

  • the model that represents ‘no effect’ or ‘no relationship between the predictor variable and the outcome’ is one where the predicted value of the outcome is always the grand mean
  • we can fit a different model to the data that represents our alternative hypotheses. We compare fit of this model to the fit of the null model
  • the intercept and one or more parameters (b) describe the model
  • the parameters determine the shape of the model that we have fitted.
  • in experimental research the parameters (b) represent the differences between group means. The bigger the differences between group means, the greater the difference between the model and the null model (grand mean)
  • if the differences between group means are large enough, then the resulting model will be a better fit to the data than the null model
  • if this is the case we can infer that our model is better than not using a model. Our group means are significantly different from the null.

The F-statistic is the ratio of the explained to the unexplained variation.
We calculate this variation using the sum of squares.

Total sum of squares (SST)

To find the total amount of variation within our data we calculate the difference between each observed data point and the grand mean. We square these differences and add them to give us the total sum of squares (SST)

The variance and the sums of squares are related such that the variance s2 = SS/(N-1),.
We can calculate the total sum of squares from the variance of all observations (the grand variance) by rearranging the relationship. SS = s2(N-1)
The grand variance: the variation between all scores, regardless of the group from which the scores come.

Model sum of squares (SSM)

We need to know how much of the variation the model can explain.
The model sums of squares tell us how much of the total variation in the outcome can be explained by the fact that different scores come from entities in different conditions.

The model sum of squares is calculated by taking the difference between the values predicted by the model and the grand mean.
When making predictions from group membership, the values predicted by the model are the group means.
The sum of the squared distances between what the model predicts for each data point and the overall mean of the outcome.
These differences are squared and added together.

  • calculate the difference between the mean of each group and the grand mean
  • square each of these differences
  • multiply each result by the number of participants within that group
  • add the values for each group together

df is here one less than the number of things used to calculate the SS.

Residual sum of squares (SSR)

The residual sum of squares tells us how much of the variation cannot be explained by the model.
The amount of variation created by things that we haven’t measured.

The simplest way to calculate SSR is to subtract SSM from SST

SSR = SST – SSM

or:
The sum of squares for each group is the squared difference between each participant’s score in a group, and the group mean. Repeat this calculation for all the participant’s in all the groups.
Multiply the variance for each group by one less than the number of people in that group. Then add the results for each group together.

DfR = dfT – dfM

or

N – k

N is the sample size

k is the number of groups

Mean squares

Mean squares: average sum of squares
We divide by the degrees of freedom because we are trying to extrapolate to a population and some parameters within that population will be held constant.

MSM = SSM/dfM

MSM represents the average amount of variation explained by the model.

MSR = SSR/dfR

MSR is a gauge of the average amount of variation explained by unmeasured variables (unsystematic variation)

The F-statistic

the F-statistic is a measure of the ratio of the variation explained by the model and the variation attributable to unsystematic facors.
The ratio of how good the model is or how bad it is.

F = MSM / MSR

The F-statistic is a signal-to-noise ratio.

  • if it is less than 1, it means that MSR is greater than MSM, and that there is more unsystematic than systematic variance. The experimental manipulation has been unsuccessful. F will be non-significant.

Interpreting F

When the model is one that predicts an outcome form group means, F evaluates whether ‘overall’ there are differences between means. It does snot provide specific information about which groups are affected.
It is an omnibus tests.

The reason why the F-test is useful is that as a single test it controls the Type I error rate.
Having established that overall group means differ we can use the parameters of the model to tell us where the differences lie.

Assumptions when comparing means

Homogeneity of variance

We assume that the variance of the outcome is steady as the predictor changes.

Is ANOVA robust?

Robust test: it doesn’t matter much if we break the assumptions, F will still be accurate.

Two issues to consider around the significance of F

  • does the F control the Type I error rate or is it significant even when there are no differences between means?
  • does F have enough power?

ANOVA is not robust.

What to do when assumptions are violated

  • Welch’s F
  • bootstrap parameter estimates
  • Kruskal-Wallsi test

Planned contrast (contrast coding)

Trouble: with two dummy variables we end up with two t-tests, which inflates the familiwise error rate.

  • also, the dummy variables might not make all the comparisons that we want to make.

Solutions:

  • use contrast coding rather than dummy coding.
    Contrast coding: a way of assigning weights to groups in dummy variables to carry out planned contrasts. Weights are assigned in such a way that the contrasts are independent, which means that the overall Type I error rate is controlled.
  • compare every group mean to all others
    Post hoc tests.

Typically planned contrasts are done to test specific hypotheses, post hoc test are used when there are not specific hypotheses.

Choosing which contrasts to do

Planned contrasts breaks down the variation due to the model/experiment into component parts.
The exact contrasts will depend upon the hypotheses you want to test.

Three rules:

  • if you have a control group, this is usually because you want to compare it against any other groups.
  • each contrast must compare only two ‘chuncks’ of variation
  • once a group has been singled out in a contrast it can’t be used in another contrast

We break down one chunk of variation into smaller independent chunks.
This independence matters for controlling the Type I error rate.

When we carry out a planned contrast, we compare ‘chunks’ of variance and these chunks often consists of several groups.
When you design a contrast that compares several groups to one other group, you are comparing the means of the groups in one chunk with the mean of the group in the other chunk.

Defining contrasts using weights

To carry out contrasts we need to code our dummy variables in a way that results in bs that compare the ‘chunks’ that we set out in our contrasts.

Basing rules for assigning values to the dummy variables to obtain the contrasts you want.

  • choose sensible contrasts
  • groups coded with positive weights will be compared against groups coded with negative weights. So assign one chunk of variation positive weights and the opposite chunk negative weights
  • if you add up the weights for a given contrast the result should be zero
  • if a group is not involved in the contrast, automatically, assign it a weight of zero, which will eliminate it from the contrast
  • for a given contrast, the weights assigned to the group(s) in one chunk of variation should be equal to the number of groups in the opposite chunk of variation

It is important that the weights for a contrast sum to zero because it ensures that you are comparing two unique chunks of variation.
Therefore, a t-statistic can be used.

Non-orthogonal contrasts

Contrasts don’t have to be orthogonal.
Non-orthogonal contrasts: contrasts that are related.
It disobeys rule 1 because the control group is singled out in the first contrast but used again in the second contrast.

There is nothing intrinsically wrong with non-orthogonal contrasts, but you must be careful about how you interpret them because the contrasts are related and so the resulting test statistics and p-values will be correlated to some degree.
The Type I error rate isn’t controlled.

Planned contrasts

  • if the F for the overall model is significant you need to find out which groups differ
  • when you have generated specific hypotheses before the experiment, use planned contrasts
  • each contrasts compares two ‘chunks’ of variance (A chunk can contain one or more groups)
  • the first contrast will usually be experimental groups against control groups
  • the next contrast will be to take one of the chunks that contained more than one group (if there were any) and divide it in to two chunks
  • you repeat this process: if there are any chunks in previous contrasts that contained more than one group that haven’t already been broken down into smaller chunks, then create new contrasts that breaks them down into smaller chunks.
  • carry on creating contrasts until each group has appeared in a chunk on its own in one of your contrasts
  • the number of contrasts you end up with should be one less than the number of experimental conditions.
  • in each contrasts assign a ‘weight’ to each group that is the value of the number of groups in the opposite chunk in that contrast
  • for a given contrast, randomly select one chunk, and for the groups in that chunk change their weights to be negative numbers

Polynomial contrasts: trend analysis

Polynomial contrast: tests for trends in data, and in its most basic form it looks for a linear trend.
There are also other trends that can be examined.

  • Quadratic trend: there is a curve in the line
  • Cubic trend: there are two changes in the direction of the trend
  • Quartic trend: has three changes in direction

Post hoc procedures

Often people have no specific a priori predictions about the data they have collected and instead they rummage around the data looking for any differences between means that they can find.

Post hoc tests consists of pairwise comparisons that are designed to compare all different combinations of the treatment groups.
It is taking each pair of groups and performing a separate test on each.

Pairwise comparisons control familywise error by correcting the level of significance for each test such that the overall Type I error rate across all comparisons remains at 0.05.

Type I and Type II error rates for post hoc tests

It is important that multiple comparison procedures control the Type I error rate but without a substantial loss in power.

The Least-significant difference (LSD) pairwise comparison makes no attempt to control the Type I error and is equivalent to performing multiple tests on the data.
But it requires the overall ANOVA to be significant.

Tthe Studentized Newman-Keuls (SNK) procedure lacks control over the familywise error rate.

Bonferroni’s and Tukey’s tests both control the Type I error rate but are conservative (the y lack statistical power).

The Ryan, Einot, Gabriel, and Welsch Q (REGWQ) has good power and tight control of the Type I error rate.

Are post hoc procedures robust?

No

Summary of post hoc procedures

  • When you have no specific hypotheses before the experiment, follow the model with post hoc tests
  • When you have equal sample sizes and group variances are similar use REGWQ or Tukey
  • If you want guaranteed control over the Type I error rate use Bonferoni
  • If sample sizes are slightly different then use Gabriel’s, but if sample sizes are very different use Hochberg’s GT2
  • If there is any doubt that group variances are equal then use Games-Howell procedure.

Comparing several means

Output for the main analysis

The table is divided into between-group effects (effects due to the model- the experimental effects) and within group effects (the unsystematic variation in the data).
The-between group effect is further broken down into a linear and quadratic component.
The between-group effect labelled combined is the overall experimental effect.

The final column labelled sig tells us the probability of getting an F at least this big if there wasn’t a difference between means in the population.

One-way independent ANOVA

  • One-way independent ANOVA compares several means, when those means have come from different groups of people. It is a special case of the linear model
  • When you have generated specific hypotheses before the experiment use planned contrasts, but if you don’t have specific hypotheses use post hoc tests.
  • There are lots of different post hoc tests.
  • You can test for homogeneity of variance using Levene’s test, but consider using a robust test in all situations (the Welch or Browne-Forsythe F) or Wilcox t1way() function
  • Locate the p-value. If the value is less than 0.05 then scientists typically interpret this as the group means being significantly different
  • For contrasts and post hoc tests, again look to the columns labelled sig. To discover if you comparisons are significant.

Calculating the effect size

R2 = SSM/SST

Eta squarred, ŋ2: R2
The square root of this value is the effect size.

Omega squared ώ2 : the effect size estimate. Uses the variance explained by the model, and the average error variance.

ώ2 = (SSM – (dfM)MSR )/ (SST + MSR)

ώ2 is a more accurate measure.

Rcontrast = Wortel (t2 / (t2+df)

Reporting results from one-way independent ANOVA

We report the F-statistic and de degrees of freedom associated with it.
Also include an effect size estimate and the p-value.

Image

Access: 
Public

Image

Join WorldSupporter!
This content is used in:

Summary of Discovering statistics using IBM SPSS statistics by Field - 5th edition

Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Countries and regions:
WorldSupporter and development goals:
This content is also used in .....

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: SanneA
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
3981