Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Any straight line can be defined by the slope (1) and the point at which the line crosses the vertical axis of the graph (intercept) (2). The general formula for the linear model is the following:

Regression analysis refers to fitting a linear model to data and using it to predict values of an outcome variable (dependent variable) from one or more predictor variables (independent variables). The residuals are the differences between what the model predicts and the actual outcome. The residual sum of squares is used to assess the ‘goodness-of-fit’ of the model on the data. The smaller the residual sum of squares, the better the fit.

Ordinary least squares regression refers to defining the regression models for which the sum of squared errors is the minimum it can be given the data. The sum of squared differences is the total sum of squares and represents how good the mean is as a model of the observed outcome scores. The model sum of squares represents how well the model can predict the data. The larger the model sum of squares, the better the model can predict the data. The residual sum of squares uses the differences between the observed data and the model and shows how much of the data the model cannot predict.

The proportion of improvement due to the model compared to using the mean as a predictor can be calculated using the following formula:

This value represents the amount of variance in the outcome explained by the model relative to how much variation there was to explain. The F-statistic can be calculated using the following formulas:

‘k’ represents the degrees of freedom and denotes the number of predictors.

The F-statistic can also be used t test the significance of with the null hypothesis being that is zero. It uses the following formula:

Individual predictors can be tested using the t-statistic.

BIAS IN LINEAR MODELS
An outlier is a case that differs substantially from the main trend in the data. Standardized residuals can be used to check which residuals are unusually large and can be viewed as an outlier. Standardized residuals are residuals converted to z-scores. Standardized residuals greater than 3.29 are considered an outlier (1), if more than 1% of the sample cases have a standardized residual of greater than 2.58, the level of error in the model may be unacceptable (2) and if more than 5% of the cases have standardized residuals with an absolute value greater than 1.96, the model may be a poor representation of the data (3).

The studentized residual is the unstandardized residual divided b an estimate of its standard deviation. These residuals have the same properties as the standardized residuals but provide a more precise estimation of the error variance of a specific case.

Influential cases are cases which exert undue influence over the parameters of the model. In order to test for influential cases, the cases can not be taken into account for the analysis in order to how different the regression coefficients will be.

The adjusted predicted value for a case is the predicted value of the outcome for that case from a model in which the case is excluded. The deleted residual is the difference between the adjusted predicted value and the original observed value. This can be divided by the standard error to give the studentized deleted residual. This residual can be compared across different regression analyses. Cook’s distance is a measure of the overall influence of a case on the model. The leverage assesses the influence of the observed value of the outcome variable over the predicted values.

The average leverage can be calculated in the following way:

The maximum leverage can be calculated using the following formula:

If no cases exert undue influence over the model, then all leverage values should be close to the average. Values greater than twice or three times the average should be investigated.

Mahalanobis distances measures the distance of cases from the mean of the predictor variable. These values have a chi-square distribution and using the alpha for that, potential influential cases can be distinguished.

There are several assumptions of the general linear model:

Additivity and linearity
The outcome variable should be linearly related to any predictors and if there are several predictors, the effects should be added together.
Independent errors
The residual terms should be uncorrelated for any two observations. This can be tested using the Durbin-Watson test. The statistic ranges from 0 to 4 and a statistic of 2 means the observations are uncorrelated.
Homoscedasticity
At each level of the independent variable, the variance of the residual terms should be constant. The residual at each level of the independent variable should have the same variance. A violation can be overcome by using a weighted least squares regression.
Normally distributed errors
The residuals in the model are random, normally distributed variables with a mean of 0.
Predictors are uncorrelated with external variables
Independent variables should not be correlated with a third variable as this weakens the conclusions you can draw.
Variable types
All predictor variables must be quantitative or categorical. All outcome variables must be quantitative, continuous and unbounded (take the whole range of values instead of a restricted range).
No perfect multicollinearity
There should be no perfect relationship between two or more of the independent variables.
Non-zero variance
The independent variable should have some variation in value.

Violation of most assumptions only has consequences for significance tests or confidence intervals. This has consequences for the generalizability of the findings.

Assessing the accuracy of a model across different samples is known as cross-validation. There are two methods of cross-validation. The adjusted R2 is the amount of variance that would be accounted for if the model had been derived from the population from which the sample was taken. It indicates the loss of predictive power. It uses the following formula:

Another method is data splitting. This involves randomly splitting the sample data, estimating the model in both halves and comparing the resulting models.

SAMPLE SIZE AND THE LINEAR MODEL
The estimate of R is dependent on the number of independent variables and the sample size. This influences the power of the model. The desired effect size and precision influences the sample size.

MULTIPLE REGRESSION
The estimates of the regression coefficients depend upon the variables in the model and the order in which they are entered. Predictors should be chosen based on whether they are sensible and if the predictors have never been added before, they should be chosen based on theoretical importance. Adding predictors that are not relevant will add noise to the model.

The order of predictors does not matter if the predictors are completely uncorrelated. Hierarchical regression is a regression analysis in which predictors are selected based on past work. This uses predictors in order of importance. Forced enter is forcing all predictors into the model simultaneously.

Stepwise regression bases decisions about the order of the predictors jut on a mathematical criterion. The forward method of the stepwise regression in which the computer searches for the best predictor, the predictor that has the highest simple correlation with the outcome and the looks for the next predictor that has the largest semi-partial correlation with the outcome. This way, predictors are chosen. In the backward method the model initially contains all the predictors and the contribution of each is evaluated with the p-value of its t-test. One danger of stepwise regression is overfitting if the sample size is sufficiently large, because then even trivial predictors will be significant.

Suppressor effects refers to a predictor having a significant effect only when another variable is held constant. This can be minimized using the backward method.

The improvements to the model at each stage can be assessed using R-squared. The significance of change of R-squared (the new model versus the old model) can be calculated using the following formula:

Perfect collinearity exists when at least one predictor is a prefect linear combination of the others (e.g. predictor one and two are perfectly correlated). There are three problems if collinearity increases:

Untrustworthy bs
The standard error of the b coefficients increase if the collinearity increases. This means more variability and a greater chance of unstable predictor equations across samples (1) and coefficients that are unrepresentative of the population (2).
It limits the size of R
The predictors account for the same variance, so R will not increase. Predictors should account for unique variance.
Importance of predictors
It is difficult to assess the importance of a predictor when there is multicollinearity.

The variance inflation factor (VIF) indicates whether a predictor has a strong linear relationship with the other predictors. The tolerance statistic (1/VIF) does the same. There are some guidelines:

If the largest VIF is > 10, then there is a strong relationship.
If the average VIF is > 1, then the regression may be biased.
Tolerance below 0.2 indicates a potential problem.

The standardized beta values are relevant for assessing the importance of each predictor. The bigger the absolute value, the more important the predictor is.

It is useful to calculate the average VIF values:

‘k’ denotes the number of predictors.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Check more of topic:

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

This content is used in:

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check the related and most recent topics and summaries:

Activities abroad, study fields and working areas:

Samenvattingen voor psychologie en gedrag

Institutions, jobs and organizations:

Universiteit Amsterdam: UVA

This content is also used in .....

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

This bundle contains everything you need to know for the fifth interim exam for the course "Scientific & Statistical Reasoning" given at the University of Amsterdam. It contains both articles, book chapters and lectures. It consists of the following materials:

...

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

Foster (2010). Causal inference and developmental psychology.” – Article summary

“Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

“Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

“Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

Dienes (2008). Understanding psychology as a science.” – Article summary

“Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

“Dennis & Kintsch (2008). Evaluating theories.” - Article summary

"Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

“Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

“Furr & Bacharach (2014). Scaling.” - Article summary

“Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

“LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

This bundle contains the chapters of the book "Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition". It includes the following chapters:

- 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18.

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 1

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 2

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 3

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 5

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 7

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 10

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 12

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 13

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 14

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 15

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 16

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 17

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 18

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 19

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: JesperN

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Contributions: posts

Add new contribution

Spotlight: topics

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance