Summary of Discovering statistics using IBM SPSS statistics by Field - 5th edition
- 11941 keer gelezen
Statistics
Chapter 18
Exploratory factor analysis
In factor analysis, we take a lot of information (variables) and a computer effortlessly reduces this into a simple message (fewer variables).
Latent variable: something that cannot be accessed directly.
Measuring what the observable measures driven by the same underlying variable are.
Factor analysis and principal component analysis (PCA) are techniques for identifying clusters of variables.
Three main uses:
If we measure several variables, or ask someone several questions about themselves, the correlation between each pair of variables can be arranged in a table.
Factor analysis attempts to achieve parsimony by explaining the maximum amount of common variance in a correlation matrix using the smallest number of explanatory constructs.
Explanatory constructs are known as latent variables (or factors) and they represent clusters of variables that correlate highly with each other.
PCA differs in that it tries to explain the maximum amount of total variance in a correlation matrix by transforming the original variables into linear components.
Factor analysis and PCA both aim to reduce the R matrix into a smaller set of dimensions.
Graphical representation
Factors and components can be visualized as the axis of a graph along which we plot variables.
The coordinates of variables along each axis represent the strength of relationship between that variable and each factor.
In an ideal world a variable will have a large coordinate for one of the axes and small coordinates for any others.
Factor loading: the coordinate of a variable along a classification axis.
If we square the factor loading for a variable we get a measure of its substantive importance to a factor.
Mathematical representation
A component ins PCA can be described as:
Componenti = b1Variable1i + b2Variable2i + … + bnVariableni
There is no intercept in the equation because the lines intersects at zero.
There is no error because we are simply transforming the variables.
The bs in this equation represent the loadings.
Ideally, variables would have very high b-values for one component and very low b-values for all other components.
The factors in factor analysis are not represented the same way as components.
A factor is defined as:
Variables = VariableMean + (Loadings X Common Factor) + Unique Factor
Common factors: factors that explain the correlations between variables
Unique factors: factors that cannot explain the correlations between variables
In PCA we predict components from the measured variables
In factor analysis we predict the measured variables form the underlying factors.
Both factor analysis and PCA are linear models in which loadings are used as weights.
In both cases, these loadings can be expressed as a matrix in which the columns represent each factor and the rows represent the loadings of each variable on each factor.
Factor scores
Having discovered which factor exists, and estimated the equation that describes them, it should be possible to estimate a person’s score on a factor, based on their scores for the constituent variables.
These are known as factor scores (or components in PCA).
The scales of measurement will influence the resulting scores, and if different variables use different measurement scales, then factor scores for different factors cannot be compared.
There are several techniques for calculating factor scores that use factor score coefficients as weights rather than the factor loadings.
Factor score coefficients can be calculated in several ways
Tot obtain the matrix of factor score coefficients (B) we multiply the matrix of factor loadings by the inverse (R-1) of the original correlation or R-matrix.
Using the regression technique, the resulting factor scores have a mean of 0 and a variance equal to the squared multiple correlation between the estimated factor scores and the true factor values.
Factor scores: a composite score for each individual on a particular factor.
There are several uses of factor scores
Choosing a method
There are two things to consider
Assuming we want to explore, we need to consider whether we want to apply our findings to the sample collected (descriptive method) or to generalize our findings to a population (inferential method).
Certain techniques assume that the sample used is the population and results cannot be extrapolated beyond that sample.
A different approach assumes that participants are randomly selected but that the variables measured constitute the population of variables in which they are interested.
By assuming this, it is possible to generalize from the sample to a larger population, but with the caveat that any findings hold true only for the set of variables measured;
Communality
The total variance of a variable in the R-matrix will have two components:
Thus, a variable that has no unique variance would have a communality of 1.
A variable that shares none of its variance with any other variable would have a communality of 0.
Factor analysis tries to find common underlying dimensions within the data and so is primarily concerned with the common variance.
We want to find out how much of the variance in our data is common.
To solutions
Factor analysis or PCA?
Factor analysis derives a mathematical model from which factors are estimated.
Principal component analysis decomposes the original data into a set of linear variates.
Only factor analysis can estimate the underlying factors and it relies on various assumptions for these estimates to be accurate.
PCA is concerned only with establishing which linear components exists within the data and how a particular variable might contribute to a given component.
Theory behind PCA
Principal component analysis works in a very similar way to MANOVA and discriminant function analysis.
We take the correlation matrix and calculate the variances.
There are no groups of observations, so the number of variates calculated will always equal the number of variables measured (p).
The variates are described by the eigenvectors with the correlation matrix.
The elements of the eigenvectors are the weights of each variable on the variate. The values are the loadings.
The largest eigenvalue associated with each of the eigenvectors provides a single indicator of the substantive importance of each component. The basic idea is that we retain components with relatively large eigenvalues and ignore those with relatively small eigenvalues.
Factor analysis works differently, but there are similarities.
Factor extraction: eigenvalues and the scree plot
In both PCA and factor analysis, not all factors are retained.
Extraction:the process of deciding how many factors to keep.
Eigenvalues associated with a variate indicate the substantive importance of that factor.
Retain only factors with large eigenvalues.
Scree plot: plotting each eigenvalue (Y-axis) against the factor with which it is associated (X-axis).
It is possible to obtain as many factors as there are variables and each has an associated eigenvalue.
By graphing the eigenvalues, the relative importance of each factor becomes apparent.
Typically there will be a few factors with quite high eigenvalues, and many factors with relatively low eigenvalues.
This graph as a very characteristic shape, there is a sharp descent in the curve followed by a tailing off.
An alternative to the scree plot is to use the eigenvalues, because these represent the amount of variation explained by a factor.
You set a criterion value that represents a substantial amount of variation and retain factors with eigenvalues above this criterion.
Two common criteria
The three criteria often provide different answers.
In these situations consider the communalities of the factors.
In both PCA and factor analysis we determine how many factors/components to extract and then re-estimate the communalities. The factors we retain will not explain all the variance data and so the communalities after extraction will always be less than 1.
The factors retained do not map perfectly onto the original variables, they merely reflect the common variance in the data.
The closer the communalities are to 1, the better our factors are at explaining the original data.
The communalities are good indices of whether too few factors have been retained.
Factor rotation
Once factors have been extracted, it is possible to calculate the degree to which variables load onto these factors.
Factor rotation: a technique used to discriminate factors.
If we visualize our factors as an axis along which variables can be plotted, then factor rotation effectively rotates these axes such that variables are loaded maximally to only one factor.
Factor rotation amounts to rotating the axes to try to ensure that both clusters of variables are intersected by the factor to which they relate most.
After rotation, the loadings of the variables are maximized on one factor and minimized on the remaining factor(s).
If an axis passes through a cluster of variables, then these variables will have a loading close to zero on the opposite axis.
There are two flavours of rotation
SPSS implements three methods or orthogonal rotation
SPSS has two methods of oblique rotation
The choice of orthogonal or oblique rotation depends on:
Factor transformation matrix: used to convert the unrotated factor loadings into the rotated ones. Values in this matrix represents the angle through which the axes have been rotated, or the degree to which factors have been rotated.
Interpreting the factor structure
Once a structure factor has been found, it needs to be interpreted.
Loadings are a gauge of the substance of a given variable to a given factor.
We use these values to place variables with factors. Every variable will have a loading on every factor, so we’re looking for variables that load highly on a given factor. Once we’ve identified these variables, we look for a theme within them.
It is possible to assess the statistical significance of a loading, but the p-value depends on sample size.
So, instead we can gauge importance by squaring the loadings to give an estimate of the amount of variance in a factor for by a variable.
When reporting factor analysis, provide readers with enough information to make an informed opinion about what you’ve done.
Be clear about your criteria for extracting factors and the method of rotation used.
Provide a table of the rotated factor loadings of all items and flag values about a criterion level.
Report the percentage of variance that each factor explains and possibly the eigenvalue too.
Measures of reliability
If you’re using factor analysis to validate a questionnaire, it is useful to check the reliability of your scale.
Reliability: that a measure should consistently reflect the construct that it is measuring.
Cronbach’s alpha, α: a measure that is loosely equivalent to creating two sets of items in every way possible and computing the correlation coefficient for each split. The average of these values.
Α = (N2 * mean(cov))/ (Σs2item+Σcovitem)
For each item in our scale we cal calculate two things.
We can construct a variance-covariance matrix of all items.
The top half of the equation is the number of items (N) squared multiplied by the average covariance between items.
The bottom half is the sum of all the item variances and item covariances.
Interpreting Cronbach’s α: some cautionary tales
The value of α depends on the number of items on the scale.
Alpha should not be used as a measure for unidimensionality (the extent to which the scale measures one underlying factor or construct).
Reverse items will affect α.
How to report reliability analysis
Report the reliabilities in the text using the symbol α.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This is a summary of the book "Discovering statistics using IBM SPSS statistics" by A. Field. In this summary, everything students at the second year of psychology at the Uva will need is present. The content needed in the thirst three blocks are already online, and the rest
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
4708 | 1 | 1 |
Add new contribution