Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Any straight line can be defined by the slope (1) and the point at which the line crosses the vertical axis of the graph (intercept) (2). The general formula for the linear model is the following:

Regression analysis refers to fitting a linear model to data and using it to predict values of an outcome variable (dependent variable) from one or more predictor variables (independent variables). The residuals are the differences between what the model predicts and the actual outcome. The residual sum of squares is used to assess the ‘goodness-of-fit’ of the model on the data. The smaller the residual sum of squares, the better the fit.

Ordinary least squares regression refers to defining the regression models for which the sum of squared errors is the minimum it can be given the data. The sum of squared differences is the total sum of squares and represents how good the mean is as a model of the observed outcome scores. The model sum of squares represents how well the model can predict the data. The larger the model sum of squares, the better the model can predict the data. The residual sum of squares uses the differences between the observed data and the model and shows how much of the data the model cannot predict.

The proportion of improvement due to the model compared to using the mean as a predictor can be calculated using the following formula:

This value represents the amount of variance in the outcome explained by the model relative to how much variation there was to explain. The F-statistic can be calculated using the following formulas:

‘k’ represents the degrees of freedom and denotes the number of predictors.

The F-statistic can also be used t test the significance of with the null hypothesis being that is zero. It uses the following formula:

Individual predictors can be tested using the t-statistic.

BIAS IN LINEAR MODELS
An outlier is a case that differs substantially from the main trend in the data. Standardized residuals can be used to check which residuals are unusually large and can be viewed as an outlier. Standardized residuals are residuals converted to z-scores. Standardized residuals greater than 3.29 are considered an outlier (1), if more than 1% of the sample cases have a standardized residual of greater than 2.58, the level of error in the model may be unacceptable (2) and if more than 5% of the cases have standardized residuals with an absolute value greater than 1.96, the model may be a poor representation of the data (3).

The studentized residual is the unstandardized residual divided b an estimate of its standard deviation. These residuals have the same properties as the standardized residuals but provide a more precise estimation of the error variance of a specific case.

Influential cases are cases which exert undue influence over the parameters of the model. In order to test for influential cases, the cases can not be taken into account for the analysis in order to how different the regression coefficients will be.

The adjusted predicted value for a case is the predicted value of the outcome for that case from a model in which the case is excluded. The deleted residual is the difference between the adjusted predicted value and the original observed value. This can be divided by the standard error to give the studentized deleted residual. This residual can be compared across different regression analyses. Cook’s distance is a measure of the overall influence of a case on the model. The leverage assesses the influence of the observed value of the outcome variable over the predicted values.

The average leverage can be calculated in the following way:

The maximum leverage can be calculated using the following formula:

If no cases exert undue influence over the model, then all leverage values should be close to the average. Values greater than twice or three times the average should be investigated.

Mahalanobis distances measures the distance of cases from the mean of the predictor variable. These values have a chi-square distribution and using the alpha for that, potential influential cases can be distinguished.

There are several assumptions of the general linear model:

Additivity and linearity
The outcome variable should be linearly related to any predictors and if there are several predictors, the effects should be added together.
Independent errors
The residual terms should be uncorrelated for any two observations. This can be tested using the Durbin-Watson test. The statistic ranges from 0 to 4 and a statistic of 2 means the observations are uncorrelated.
Homoscedasticity
At each level of the independent variable, the variance of the residual terms should be constant. The residual at each level of the independent variable should have the same variance. A violation can be overcome by using a weighted least squares regression.
Normally distributed errors
The residuals in the model are random, normally distributed variables with a mean of 0.
Predictors are uncorrelated with external variables
Independent variables should not be correlated with a third variable as this weakens the conclusions you can draw.
Variable types
All predictor variables must be quantitative or categorical. All outcome variables must be quantitative, continuous and unbounded (take the whole range of values instead of a restricted range).
No perfect multicollinearity
There should be no perfect relationship between two or more of the independent variables.
Non-zero variance
The independent variable should have some variation in value.

Violation of most assumptions only has consequences for significance tests or confidence intervals. This has consequences for the generalizability of the findings.

Assessing the accuracy of a model across different samples is known as cross-validation. There are two methods of cross-validation. The adjusted R2 is the amount of variance that would be accounted for if the model had been derived from the population from which the sample was taken. It indicates the loss of predictive power. It uses the following formula:

Another method is data splitting. This involves randomly splitting the sample data, estimating the model in both halves and comparing the resulting models.

SAMPLE SIZE AND THE LINEAR MODEL
The estimate of R is dependent on the number of independent variables and the sample size. This influences the power of the model. The desired effect size and precision influences the sample size.

MULTIPLE REGRESSION
The estimates of the regression coefficients depend upon the variables in the model and the order in which they are entered. Predictors should be chosen based on whether they are sensible and if the predictors have never been added before, they should be chosen based on theoretical importance. Adding predictors that are not relevant will add noise to the model.

The order of predictors does not matter if the predictors are completely uncorrelated. Hierarchical regression is a regression analysis in which predictors are selected based on past work. This uses predictors in order of importance. Forced enter is forcing all predictors into the model simultaneously.

Stepwise regression bases decisions about the order of the predictors jut on a mathematical criterion. The forward method of the stepwise regression in which the computer searches for the best predictor, the predictor that has the highest simple correlation with the outcome and the looks for the next predictor that has the largest semi-partial correlation with the outcome. This way, predictors are chosen. In the backward method the model initially contains all the predictors and the contribution of each is evaluated with the p-value of its t-test. One danger of stepwise regression is overfitting if the sample size is sufficiently large, because then even trivial predictors will be significant.

Suppressor effects refers to a predictor having a significant effect only when another variable is held constant. This can be minimized using the backward method.

The improvements to the model at each stage can be assessed using R-squared. The significance of change of R-squared (the new model versus the old model) can be calculated using the following formula:

Perfect collinearity exists when at least one predictor is a prefect linear combination of the others (e.g. predictor one and two are perfectly correlated). There are three problems if collinearity increases:

Untrustworthy bs
The standard error of the b coefficients increase if the collinearity increases. This means more variability and a greater chance of unstable predictor equations across samples (1) and coefficients that are unrepresentative of the population (2).
It limits the size of R
The predictors account for the same variance, so R will not increase. Predictors should account for unique variance.
Importance of predictors
It is difficult to assess the importance of a predictor when there is multicollinearity.

The variance inflation factor (VIF) indicates whether a predictor has a strong linear relationship with the other predictors. The tolerance statistic (1/VIF) does the same. There are some guidelines:

If the largest VIF is > 10, then there is a strong relationship.
If the average VIF is > 1, then the regression may be biased.
Tolerance below 0.2 indicates a potential problem.

The standardized beta values are relevant for assessing the importance of each predictor. The bigger the absolute value, the more important the predictor is.

It is useful to calculate the average VIF values:

‘k’ denotes the number of predictors.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

This bundle contains the chapters of the book "Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition". It includes the following chapters: - 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18. Read more

3164 keer gelezen

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

This bundle contains everything you need to know for the fifth interim exam for the course "Scientific & Statistical Reasoning" given at the University of Amsterdam. It contains both articles, book chapters and lectures. It consists of the following materials:...Read more

1614 keer gelezen

Check more of topic:

Samenvattingen voor psychologie en gedrag

Universiteit Amsterdam: UVA

This content is used in:

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Search a summary

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

Spotlight: topics

Check the related and most recent topics and summaries:

Institutions, jobs and organizations:

Universiteit Amsterdam: UVA

Activity abroad, study field of working area:

Samenvattingen voor psychologie en gedrag

This content is also used in .....

Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

...

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

Foster (2010). Causal inference and developmental psychology.” – Article summary

“Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

“Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

“Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

Dienes (2008). Understanding psychology as a science.” – Article summary

“Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

“Dennis & Kintsch (2008). Evaluating theories.” - Article summary

"Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

“Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

“Furr & Bacharach (2014). Scaling.” - Article summary

“Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

“LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

Summary of Discovering statistics using IBM SPSS statistics by Andy Field - 5th edition

This bundle contains the chapters of the book "Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition". It includes the following chapters:

- 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18.

Scientific_and_Statistical_Reasoning_University_of_Amsterdam.png

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 1

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 2

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 3

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 5

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 7

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 10

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 12

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 13

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 14

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 15

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 16

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 17

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 18

Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 19

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the summaries home pages for your study or field of study
Use the check and search pages for summaries and study aids by field of study, subject or faculty
Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
Check or follow authors or other WorldSupporters
Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports

Main study fields NL:

Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden

WorldSupporter: what are the features, functionalities and rules on WorldSupporter.org?

WorldSupporter NL: hoe vind je samenvattingen en studiehulp op WorldSupporter.org en JoHo.org

Summaries and Study Assistance - Start

Follow the author: JesperN

JesperN

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

2855