Summary of Discovering statistics using IBM SPSS statistics by Field - 5th edition
- 11788 keer gelezen
Statistics
Chapter 9
The linear model (regression)
The linear model with one predictor
outcome = (b0+b1xi) +errori
This model uses an unstandardised measure of the relationship (b1) and consequently we include a parameter b0 that tells us the value of the outcome when the predictor is zero.
Any straight line can be defined by two things:
These parameters are regression coefficients.
The linear model with several predictors
The linear model expands to include as many predictor variables as you like.
An additional predictor can be placed in the model given a b to estimate its relationship to the outcome:
Yi = (b0 +b1X1i +b2X2i+ … bnXni) + Ɛi
bn is the coefficient is the nth predictor (Xni)
Regression analysis is a term for fitting a linear model to data and using it to predict values of an outcome variable form one or more predictor variables.
Simple regression: with one predictor variable
Multiple regression: with several predictors
Estimating the model
No matter how many predictors there are, the model can be described entirely by a constant (b0) and by parameters associated with each predictor (bs).
To estimate these parameters we use the method of least squares.
We could assess the fit of a model by looking at the deviations between the model and the data collected.
Residuals: the differences between what the model predicts and the observed values.
To calculate the total error in a model we square the differences between the observed values of the outcome, and the predicted values that come from the model:
total error: Σni=1(observedi-modeli)2
Because we call these errors residuals, this is called the residual sum of squares (SSR).
It is a gauge of how well a linear model fits the data.
The least SSR gives us the best model.
Assessing the goodness of fit, sums of squares R and R2
Goodness of fit: how well the model fits the observed data
Total sum of squares (SST): how good the mean is as a model of the observed outcome scores.
We can use the values of SST and SSR to calculate how much better the linear model is than the baseline model of ‘no relationship’.
The improvement in prediction resulting from using the linear model rather than the mean is calculated as the difference between SST and SSR.
This improvement is the model sum of squares SSM
R2 = SSM/ SST
R2 is the improvement due to the model
Another use of the sums of squares is in assessing the F-test.
Mean squares (MS): the sum of squares divided by the associated degrees of freedom.
MSM = SSM/k
MSR = SSR/ (N – k – 1)
F = MSM/MSR
F has an associated probability distribution from which a p-value can be derived to tell us the probability of getting an F at least as big as one we have if the null hypothesis were true.
The F statistic can also used to the significance R2
F = ((N – k – 1)R2) / (k(1-R2)
Assessing individual predictors
Any predictor in a linear model has a coefficient (bi). The value of b represents the change in the outcome resulting from a unit change in a predictor.
The t-statistic is based on the ratio of explained variance against unexplained variance or error
t = (bobserved – bexpected)/ SEb
The statistic t has a probability distribution that differs accordingly to the degrees of freedom for the text.
Outliers
An outlier: a case that differs substantially from the main trend in the data.
Outliers can affect the estimates of the regression coefficients.
Standardized residuals: the residuals converted to z-scores and so are expressed in standard deviation units.
Regardless of the variables of the model, standardized residuals are distributed around a mean of 0 with a standard deviation of 1.
Influential cases
There are several statistics used to assess the influence of a case.
DFBeta: the difference between a parameter estimated using all cases and estimated when one case is excluded.
DFFit: the difference between the predicted values for a case when the model is estimated including or excluding that case.
Covariance ratio (CVR): quantifies the degree to which a case influences the variance of the regression parameters.
Assumptions of the linear model
Cross-validation of the model
Even if we can’t be confident that the model derived from our sample accurately represents the population, we can assess how well our model might predict the outcome in a different sample.
Cross-validation: assessing the accuracy of a model across different samples.
If a model can be generalized, then it must be capable of accurately predicting the same outcome variable form the same set of predictors in a different group of people.
Once we have estimated the model there are two main methods of cross-validation:
The sample size required depends on the size of effect that we’re trying to detect and how much power we want to detect in these effects.
The bigger the sample size the better.
Summary
a great deal of care should be taken in selecting predictors for a model because the estimates of the regression coefficients depend upon the variables in the model.
Methods of entering predictors into the model
Having chosen predictors, you must decide the order to enter them into the model.
Comparing models
Hierarchical methods involve adding predictors to the model stages, and it is useful to assess the improvement to the model at each stage.
A simple way to quantify the improvement is to compare R2 for the new model to that for the old model.
Fchange = ((N – knew -1)R2change)/(kchange(1-R2change))
We can compare models using this F-statistic.
Multicollinearity
Multicollinearity exists when there is a strong correlation between two or more predictors.
Perfect collinearity: when at least one predictor is a perfect linear combination of the others.
As collinearity increases there are three problems that arise:
Variance inflation factor (VIF): indicates whether a predictor has a strong linear relationship with the other predictor(s). The tolerance statistic is its reciprocal.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This is a summary of the book "Discovering statistics using IBM SPSS statistics" by A. Field. In this summary, everything students at the second year of psychology at the Uva will need is present. The content needed in the thirst three blocks are already online, and the rest
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
4138 |
Add new contribution