Multivariate analysis of variance (MANOVA) - summary of chapter 17 of Statistics by A. Field (5th edition)

Statistics
Chapter 17
Multivariate analysis of variance (MANOVA)

Introducing MANOVA
Introducing matrices
The theory behind MANOVA
Practical issues when conducting MANOVA
Summary
Reporting results from MANOVA

Introducing MANOVA

Multivariate analysis of variance (MANOVA) is used when we are interested in several outcomes.

The principles of the linear model extend to MANOVA in that we can use MANOVA when there is one independent variable or several, we can look at interactions between outcome variables, and we can do contrasts to see which groups differ.

Univariate: the model when we have only one outcome variable.
Multivariate: the model when we include several outcome variables simultaneously.

We shouldn’t fit separate linear models to each outcome variable.

Separate models can tell us only whether groups differ along a single dimension, MANOVA has the power to detect whether groups differ along a combination of dimensions.

Choosing outcomes

It is a bad idea to lump outcome measures together in a MANOVA unless you have a good theoretical or empirical basis for doing so.
Where there is a good theoretical basis for including some, but not all, of your outcome measures, then fit separate models: one for the outcomes being tested on a heuristic and one for the theoretically meaningful outcomes.

The point here is not to include lots of outcome variables in a MANOVA just because you measured them.

Introducing matrices

A matrix: a grid of numbers arranged in columns and rows.
A matrix can have many columns and rows, and we specify its dimensions using numbers.
For example: a 2 x 3 matrix is a matrix with two rows and three columns.

The values within a matrix are components or elements.
The rows and columns are vectors.

A square matrix: a matrix with an equal number of columns and rows.

An identity matrix: a square matrix in which the diagonal elements are 1 and the off-diagonal elements are 0.

The matrix that represents the systematic variance (or the model sum of squares for all variables) is denoted by the letter H and is called the hypothesis sum of squares and cross-products matrix (or hypothesis SSCP).

The matrix that represents the unsystematic variance (the residual sums of squares for all variables) is denoted by the letter E and called the error sum of squares and cross-products matrix (or error SSCP).

The matrix that represents the total amount of variance present for each outcome variable is denoted by T and is called the total sum of squares and cross-products matrix (or total SSCP).

Cross-products represent a total value for the combined error between two variables.
Whereas the sum of squares of a variable is the total squared difference between the observed values and the mean value, the cross-product is the total combined error between two variables.

The theory behind MANOVA

The relationship between outcomes: cross-products

If we want a measure of the relationship that is comparable to a sum of squares then we need something that quantifies as the total relationship. The cross-product does this job.

There are three relevant cross-products that correspond to the three sum s of squares (SS_T, SS_M, SS_R)

the total cross-product
the cross-product for the model
a residual cross-product.

The cross-product is the difference between the scores and the mean for one variable multiplied by the difference between the scores and the mean for another variable.

The total cross-product (CP_T)

In the case of the total cross-product, the mean of interest is the grand mean for each outcome variable.

CP_T = Σⁿ_i=1(x_i(a) - ẍ_Grand(a))(x_i(b) – ẍ_Grand(b))

For each outcome variable you take each score and subtract from it the grand mean for that variable. This leaves you with two variables per participant that are multiplied together to get the cross-product for each participant.
The total can then be found by adding the cross-products of all participants.

The total cross-product is a gauge of the overall relationship between two variables.

Model cross-product (CP_M)

How the relationship between the outcome variables is influenced by our experimental manipulation.

CP_M = Σ^k_g=1n[(ẍ_g_(a) - ẍ_Grand(a))(ẍ_g_(b) – ẍ_Grand(b))]

First, the difference between each group mean and the grand mean is calculated for each outcome variable.
The cross-product is calculated by multiplying the differences found in each group.
Each product is then multiplied by the number of scores within the group.

Residual cross-product (CP_R)

How the relationship between the two outcome variables is influenced by individual differences/ unmeasured variables.

CP_R = Σⁿ_i=1(x_i(a) - ẍ_Gr_oup_(a))(x_i(b) – ẍ_Gr_oup_(b))

The group means are used rather than the grand mean.
Calculate each of the different scores, take each score and subtract from it the mean of the group to which it belongs.

Can also be calculated by:

CP_R = CP_T – CP_M

Each of the cross-products tells us something about the relationship between the two outcome variables.

The total SSCP matrix (T)

SSCP matrices are square.
The total SSCP matrix, T, contains the total sum of squares for each outcome variable and the total cross-product between the outcome variables.

The residual SSCP matrix (E)

The residual (or error) sum of squares and cross-products matrix, E, contains the residual sum of squares of each outcome variable and the residual cross-product between the two outcome variables.

The residual SSCP represents both the unsystematic variation that exists for each outcome variable and the co-dependence between the outcome variables that is due to unmeasured factors.

The model SSCP matrix (H)

The model (or hypothesis) sum of squares and cross-product matrix, H, contains the model sum of squares for each outcome variable and the model cross-product between the two outcome variables.

The model SSCP represents both the systematic variation that exists for each outcome variable and the co-dependence between the outcome variables and that is due to the model.

Matrices are additive, you can add (or subtract) two matrices together by adding (or subtracting) corresponding components.

HE^-1: an analogue of F

The univariate F is the ratio of systematic variance to unsystematic variance.
The conceptual equivalent would therefore by to divide the matrix H by the matrix E.
The matrix equivalent to division is to multiply by what’s known as the inverse of a matrix.

So an analogue of F would be to divide H by E, or in matrix terms to multiply H by the inverse of E (denoted as E^-1).
The resulting matrix is called HE^-1 and is a multivariate analogue of the univariate F.

HE^-1 represents the ratio of systematic variance in the model to unsystematic variance in the model.

Matrices end up with several values.
HE^-1 will always contain p² values, where p is the number of outcome variables.

Discriminant function variables

The problem of having several values with which to assess statistical significance can be simplified by converting the outcome variables into underlying dimensions or factors.

It is possible to calculate underlying linear dimensions of the outcome variables known a variates (or components). In this context, we want to use these linear variates to predict to which group a person belongs.
Because we are using them to discriminate groups of people/cases these variates are called: discriminant functions or discriminant function variates.

To discover these discriminant functions we use a mathematical procedure of maximization, such that the first discriminant function (V₁) is the linear combination of outcome variables that maximizes the differences between groups.
If follows from this that the ratio of systematic to unsystematic variance (SS_M/SS_R) will be maximized for this first variate, but subsequent variates will have smaller values of this ratio.

We obtain the maximum possible value of the F-statistic when we look at the first discriminant function.
This variate can be described in terms of a linear model equation because it is a linear combination of the outcome variables:

y_i =b₀ + b₁X_1i + b₂X_2i

V_1i = b₀ + b₁Outcome1_1i + b₂Outcome2_2i

The b-values in the equation are weights that tell us something about the contribution of each outcome variable to the variate in question.
The values of b for the discriminant functions are obtained from the eigenvectors of HE^-1.
We can ignore b₀ as well because it sev3rs only to locate the variate in geometric space, which isn’t necessary when we’re using the variate to discriminate groups.

In a situation where there are only two outcome variables and two groups to predict, there will be only one variate.

Eigenvectors are the vectors associated with a given matrix that are unchanged by transformation of that matrix to a diagonal matrix.

In an identity matrix the off-diagonal elements are zero, and by changing HE^-1 into an identity matrix we eliminate all of the off-diagonal elements.
Therefore, by calculating the eigenvectors and eigenvalues, we still end up with values that represent the ratio of systematic to unsystematic variance (because they are unchanged by the transformation) but there are fewer of them.

The variates extracted are independent dimensions constructed from a linear combination of the outcome variables that were measured.

The eigenvalues are conceptually equivalent to the univariate F-statistic and so the final step is to assess how large these values are compared to what we would expect if there were no effect in the population.
There are four common ways to do this:

The pillai-bartlett trace (V)

V = Σ^s_i=1 λ_i/1+λ_i

λ represents the eigenvalues for each of the discriminant variates
s represents the number of variates.

Pillai’s trace is the sum of the proposition of explained variance on the discriminant functions.
It is similar to the R² ratio.

Hotelling’s T²

The Hotelling-Lawley trace (or Hotelling’s T²) is the sum of eigenvalues for each variate.

T = Σ^s_i=1λ_i

This test statistic is the sum of SS_M/SS_R for each of the variates and so it compares directly to the univariate F-statistic.

Wilks’s lambda (Ʌ)

Wilk’s lambda is the product of the unexplained variance on each of the variates.

Ʌ = П^s_i=11/(1+λ_i)

П = multiply
Large eigenvalues (which in themselves represent large experimental effects) lead to small values of Wilk’s lambda.

Roy’ largest root

The eigenvalue (or root) of the first variate.

Ө = λ_Largest

Roy’s largest root represents the proportion of explained variance to unexplained variance (SS_M/SS_R) for the first discriminant function.
It represents the maximum possible between-group difference given the data collected.

Practical issues when conducting MANOVA

Assumptions

MANOVA has similar assumptions to all the models in this book but extended to the multivariate case:

Independence: residuals should be statistically independent
Random sampling: data should be randomly sampled from the population of interest and measured at an interval level
Multivariate normality: our residuals are normally distributed. Residuals have multivariate normality.
Homogeneity of covariance matrices: the variances in each group are roughly equal. In MANOVA we assume this is true for each outcome variable, but also that the correlation between any tow outcome variables is the same in all groups.

The effect of violating the assumption of equality of covariance matrices is unclear, except that Hotellling’s T² is robust in the two-group situation when sample sizes are equal.
The assumption can be tested using Box’s test, which should be non-significant if the matrices are similar.

The variance-covariance matrices for samples should be inspected to assess whether the printed probabilities for the the multivariate test statistics are likely to be conservative or liberal.

the larger samples produce greater variances and covariances then the probability values will be conservative (and so significant findings can be trusted)
if the smaller samples that produce the larger variances and covariances then the probability values will be liberal and so significant differences should be treated with caution (although non-significant effects can be trusted).

In the event that you cannot trust the printed probabilities, there is little you can do except equalize the samples by randomly deleting cases in the larger groups.

Choosing a test statistic

For small and moderate sample sizes the four statistics differ little
If group differences are concentrated on the first variate Roy’s statistic should have most power, followed by Hotelling’s trace, Wilk’s lambda and Pillai’s trace
When groups differ along more than one variate, this power is reversed
Unless sample sizes are large it’s probably wise to use fewer than 10 outcome variables.

All four test statistics are relatively robust to violations of multivariate normality.

Roy’s root is not robust when the homogeneity of covariance matrix assumption is untenable
When samples sizes are equal, the Pillai-Bartlett trace is the most robust to violations of assumptions.

Follow up analysis

The traditional approach is to follow a significant MANOVA with separate univariate models (ANOVA) on each of the outcome variables.
If you do use univariate Fs then you ought to apply a Bonferroni correction.

Summary

MANOVA is used to test the difference between groups across several outcome variables/outcomes simultaneously.
Box’s test looks at the assumption of equal covariance matrices. This test can be ignored when sample sizes are equal because when they are, some MANOVA test statistics are robust to violations of this assumption. If group sizes differ this test should be inspected. If the value of sig is less than 0.001 then the results of the analysis should not be trusted.
The table labelled Multivariate Tests gives us four test statistics. I recommend using Pillai’s trace. If the value of sig for this statistic is less than 0.05 then the groups differ significantly with respect to a linear combination of the outcome variables.
Univariate F-statistics can be used to follow up the MANOVA (a different F-statistic for each outcome variable). The results of these are listed in the table entitled Test of Between-Subjects Effects. These F-statistics can in turn be followed up using contrasts. Personally I recommend discriminant function analysis over this approach.

Reporting results from MANOVA

The multivariate tests are converted into approximate Fs, and people just report these Fs in the usual way.
The multivariate test statistic should be quoted as well.