Summary of Discovering statistics using IBM SPSS statistics by Field - 5th edition
- 11807 keer gelezen
Statistics
Chapter 17
Multivariate analysis of variance (MANOVA)
Multivariate analysis of variance (MANOVA) is used when we are interested in several outcomes.
The principles of the linear model extend to MANOVA in that we can use MANOVA when there is one independent variable or several, we can look at interactions between outcome variables, and we can do contrasts to see which groups differ.
Univariate: the model when we have only one outcome variable.
Multivariate: the model when we include several outcome variables simultaneously.
We shouldn’t fit separate linear models to each outcome variable.
Separate models can tell us only whether groups differ along a single dimension, MANOVA has the power to detect whether groups differ along a combination of dimensions.
Choosing outcomes
It is a bad idea to lump outcome measures together in a MANOVA unless you have a good theoretical or empirical basis for doing so.
Where there is a good theoretical basis for including some, but not all, of your outcome measures, then fit separate models: one for the outcomes being tested on a heuristic and one for the theoretically meaningful outcomes.
The point here is not to include lots of outcome variables in a MANOVA just because you measured them.
A matrix: a grid of numbers arranged in columns and rows.
A matrix can have many columns and rows, and we specify its dimensions using numbers.
For example: a 2 x 3 matrix is a matrix with two rows and three columns.
The values within a matrix are components or elements.
The rows and columns are vectors.
A square matrix: a matrix with an equal number of columns and rows.
An identity matrix: a square matrix in which the diagonal elements are 1 and the off-diagonal elements are 0.
The matrix that represents the systematic variance (or the model sum of squares for all variables) is denoted by the letter H and is called the hypothesis sum of squares and cross-products matrix (or hypothesis SSCP).
The matrix that represents the unsystematic variance (the residual sums of squares for all variables) is denoted by the letter E and called the error sum of squares and cross-products matrix (or error SSCP).
The matrix that represents the total amount of variance present for each outcome variable is denoted by T and is called the total sum of squares and cross-products matrix (or total SSCP).
Cross-products represent a total value for the combined error between two variables.
Whereas the sum of squares of a variable is the total squared difference between the observed values and the mean value, the cross-product is the total combined error between two variables.
The relationship between outcomes: cross-products
If we want a measure of the relationship that is comparable to a sum of squares then we need something that quantifies as the total relationship. The cross-product does this job.
There are three relevant cross-products that correspond to the three sum s of squares (SST, SSM, SSR)
The cross-product is the difference between the scores and the mean for one variable multiplied by the difference between the scores and the mean for another variable.
The total cross-product (CPT)
In the case of the total cross-product, the mean of interest is the grand mean for each outcome variable.
CPT = Σni=1(xi(a) - ẍGrand(a))(xi(b) – ẍGrand(b))
For each outcome variable you take each score and subtract from it the grand mean for that variable. This leaves you with two variables per participant that are multiplied together to get the cross-product for each participant.
The total can then be found by adding the cross-products of all participants.
The total cross-product is a gauge of the overall relationship between two variables.
Model cross-product (CPM)
How the relationship between the outcome variables is influenced by our experimental manipulation.
CPM = Σkg=1n[ (ẍg(a) - ẍGrand(a))(ẍg(b) – ẍGrand(b))]
First, the difference between each group mean and the grand mean is calculated for each outcome variable.
The cross-product is calculated by multiplying the differences found in each group.
Each product is then multiplied by the number of scores within the group.
Residual cross-product (CPR)
How the relationship between the two outcome variables is influenced by individual differences/ unmeasured variables.
CPR = Σni=1(xi(a) - ẍGroup(a))(xi(b) – ẍGroup(b))
The group means are used rather than the grand mean.
Calculate each of the different scores, take each score and subtract from it the mean of the group to which it belongs.
Can also be calculated by:
CPR = CPT – CPM
Each of the cross-products tells us something about the relationship between the two outcome variables.
The total SSCP matrix (T)
SSCP matrices are square.
The total SSCP matrix, T, contains the total sum of squares for each outcome variable and the total cross-product between the outcome variables.
The residual SSCP matrix (E)
The residual (or error) sum of squares and cross-products matrix, E, contains the residual sum of squares of each outcome variable and the residual cross-product between the two outcome variables.
The residual SSCP represents both the unsystematic variation that exists for each outcome variable and the co-dependence between the outcome variables that is due to unmeasured factors.
The model SSCP matrix (H)
The model (or hypothesis) sum of squares and cross-product matrix, H, contains the model sum of squares for each outcome variable and the model cross-product between the two outcome variables.
The model SSCP represents both the systematic variation that exists for each outcome variable and the co-dependence between the outcome variables and that is due to the model.
Matrices are additive, you can add (or subtract) two matrices together by adding (or subtracting) corresponding components.
HE-1: an analogue of F
The univariate F is the ratio of systematic variance to unsystematic variance.
The conceptual equivalent would therefore by to divide the matrix H by the matrix E.
The matrix equivalent to division is to multiply by what’s known as the inverse of a matrix.
So an analogue of F would be to divide H by E, or in matrix terms to multiply H by the inverse of E (denoted as E-1).
The resulting matrix is called HE-1 and is a multivariate analogue of the univariate F.
HE-1 represents the ratio of systematic variance in the model to unsystematic variance in the model.
Matrices end up with several values.
HE-1 will always contain p2 values, where p is the number of outcome variables.
Discriminant function variables
The problem of having several values with which to assess statistical significance can be simplified by converting the outcome variables into underlying dimensions or factors.
It is possible to calculate underlying linear dimensions of the outcome variables known a variates (or components). In this context, we want to use these linear variates to predict to which group a person belongs.
Because we are using them to discriminate groups of people/cases these variates are called: discriminant functions or discriminant function variates.
To discover these discriminant functions we use a mathematical procedure of maximization, such that the first discriminant function (V1) is the linear combination of outcome variables that maximizes the differences between groups.
If follows from this that the ratio of systematic to unsystematic variance (SSM/SSR) will be maximized for this first variate, but subsequent variates will have smaller values of this ratio.
We obtain the maximum possible value of the F-statistic when we look at the first discriminant function.
This variate can be described in terms of a linear model equation because it is a linear combination of the outcome variables:
yi =b0 + b1X1i + b2X2i
V1i = b0 + b1Outcome11i + b2Outcome22i
The b-values in the equation are weights that tell us something about the contribution of each outcome variable to the variate in question.
The values of b for the discriminant functions are obtained from the eigenvectors of HE-1.
We can ignore b0 as well because it sev3rs only to locate the variate in geometric space, which isn’t necessary when we’re using the variate to discriminate groups.
In a situation where there are only two outcome variables and two groups to predict, there will be only one variate.
Eigenvectors are the vectors associated with a given matrix that are unchanged by transformation of that matrix to a diagonal matrix.
In an identity matrix the off-diagonal elements are zero, and by changing HE-1 into an identity matrix we eliminate all of the off-diagonal elements.
Therefore, by calculating the eigenvectors and eigenvalues, we still end up with values that represent the ratio of systematic to unsystematic variance (because they are unchanged by the transformation) but there are fewer of them.
The variates extracted are independent dimensions constructed from a linear combination of the outcome variables that were measured.
The eigenvalues are conceptually equivalent to the univariate F-statistic and so the final step is to assess how large these values are compared to what we would expect if there were no effect in the population.
There are four common ways to do this:
The pillai-bartlett trace (V)
V = Σsi=1 λi/1+λi
λ represents the eigenvalues for each of the discriminant variates
s represents the number of variates.
Pillai’s trace is the sum of the proposition of explained variance on the discriminant functions.
It is similar to the R2 ratio.
Hotelling’s T2
The Hotelling-Lawley trace (or Hotelling’s T2) is the sum of eigenvalues for each variate.
T = Σsi=1λi
This test statistic is the sum of SSM/SSR for each of the variates and so it compares directly to the univariate F-statistic.
Wilks’s lambda (Ʌ)
Wilk’s lambda is the product of the unexplained variance on each of the variates.
Ʌ = Пsi=11/(1+λi)
П = multiply
Large eigenvalues (which in themselves represent large experimental effects) lead to small values of Wilk’s lambda.
Roy’ largest root
The eigenvalue (or root) of the first variate.
Ө = λLargest
Roy’s largest root represents the proportion of explained variance to unexplained variance (SSM/SSR) for the first discriminant function.
It represents the maximum possible between-group difference given the data collected.
Assumptions
MANOVA has similar assumptions to all the models in this book but extended to the multivariate case:
The effect of violating the assumption of equality of covariance matrices is unclear, except that Hotellling’s T2 is robust in the two-group situation when sample sizes are equal.
The assumption can be tested using Box’s test, which should be non-significant if the matrices are similar.
The variance-covariance matrices for samples should be inspected to assess whether the printed probabilities for the the multivariate test statistics are likely to be conservative or liberal.
In the event that you cannot trust the printed probabilities, there is little you can do except equalize the samples by randomly deleting cases in the larger groups.
Choosing a test statistic
All four test statistics are relatively robust to violations of multivariate normality.
Follow up analysis
The traditional approach is to follow a significant MANOVA with separate univariate models (ANOVA) on each of the outcome variables.
If you do use univariate Fs then you ought to apply a Bonferroni correction.
The multivariate tests are converted into approximate Fs, and people just report these Fs in the usual way.
The multivariate test statistic should be quoted as well.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This is a summary of the book "Discovering statistics using IBM SPSS statistics" by A. Field. In this summary, everything students at the second year of psychology at the Uva will need is present. The content needed in the thirst three blocks are already online, and the rest
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
3946 |
Add new contribution