Seminar Assumptions and Bootstraps

Summary and study notes 

Welke onderwerpen worden behandeld in het hoorcollege?

Assumptions. In the lecture there are five different assumptions discussed: outliers, multicollinearity, homoscedasticity, linearity and normality distributed residuals. Look at the notes at the end in this document. 

Bootstrapping. You use bootstrap when distributions are not in agreement with the assumptions causing. 

Mediation = the relationship between an independent variable and a dependent variable via the inclusion on a third hypothetical variable, the mediator variable. When you do mediation you always have to use bootstrap, because there is an indirect effect.

Welke onderwerpen worden besproken die niet worden behandeld in de literatuur?

In dit college worden geen andere onderwerpen besproken dit niet worden behandeld in de literatuur.  

Welke recente ontwikkelingen in het vakgebied worden besproken? 

Er worden geen recente ontwikkelingen besproken. 

Welke opmerkingen worden er tijdens het college gedaan door de docent met betrekking tot het tentamen?

Er worden geen vragen gesteld over effects size op het tentamen. 

Welke vragen worden behandeld die gesteld kunnen worden op het tentamen? 

Er worden geen tentamenvragen behandeld. 

Hoorcollege aantekeningen

Assumptions and violations

  1. Outlier may influence your results. If there is an outlier, you have to remove it, especially if it is a theoretical illogical value. Do analysis with and without and inspect whether conclusion is the same (but what if not). Leave it, but correct; e.g. use robust estimator (look at the median instead of the mean). 
  2. Multicollinearity. Toleance = 1/VIF. Tolereance < .2 is a possible problem, tolerance < .1 is a problem. VIF > 5 is a possible problem, VIF > 10 is a problem. When predictors correlate strongly (> 0.8), it is impossible to compute unique estimations for the regression coefficients. The estimations of the b-coefficients are unreliable. The importance of individual predictors is difficult to determine. When there is multicollinearity you have to decide whether you leave all predictors out or some. 
  • In case of an interaction, you have to center the variables. 
  • Remove one of the predictors (leave at least one out, but not all). 
  • Use ‘latent’ effect instead of original variables; use factor analysis or sum score. 
  1. Homoscedasticity = for each x-value there is the same spread. The consequence is that hypotheses tests are no longer valid. If there is heteroscedastic, you can use bootstrap. You can also test for linearity with the plots of homoscedasticity, when there is no symmetry around the middle line, there is no linear relationship. 
  2. LinearityYou can look for linearity in a plot when you add line for quadratic effects (R2). There is a linear relationship when the spread around the line is symmetric. When there is no linearity, you have to add quadratic effect (so, transform into new variable which is squared. 
  3. Normality distributed residualsThe residuals have to be normally distributed. You can check this assumption in a plot. You can also test it when you increase the sample size, with a small sample size there is no significant value, but with a large sample size you will get a significant value if there is normality. Dealing with non-normality:
  • Ignore problem, claim ML estimation is robust. Defensible if distribution not extreme and large N.
  • Use normalizing transformation (dependent variable). Square root, logarithm, inverse, normalized scores. If often does not make sense to look at the transformed variables. 
  • Use robust estimators (MLR). Works well in many occasions. N > 200, larger with large models. 
  • Bootstrapping.
  • Bayes estimation. 

Bootstrapping

It is called bootstrapping because you have to help yourself out with the means that you have. You use bootstrap when distributions are not in agreement with the assumptions causing: for example, non-normal errors, heteroscedasticity, small samples, moderation, mediation, count variables. When this assumption is not valid, you do bootstrap. When you cannot assume it is a t-distribution, you approximate the sampling distribution by re-sampling (with replacement). In the end you get other p-values and other scores that are valid. If you only sample 10 persons, you resample your sample for multiple times (1000 times) and then you got a normal distribution. This normal distribution will be used with your calculations. You sample with replacement; this means that you can pick one individual twice. Bootstrapping is randomly done. 

Advantages of bootstrapping

It is simple, you don’t need a distribution. There is no assumption to check. We don’t need a normal data or a large sample size. We can obtain the SE and CI for complex parameters, such as correlation coefficients. You can check the stability of the results. It may give you more accurate scores. 

Why do we use bootstraps?

We don’t know the real distribution (population), but only the data. The data we can use as a proxy for the population. We draw multiple samples from this proxy (resampling), as if we sample from the population. Compute the statistics of interest on each of the sampled datasets. Calculate the mean and confidence interval from the distribution of statistics. 

The only assumption is that your sample needs to be representative for the population! When we use bootstrap, we don’t need the other assumptions. 

Mediation

Be aware of the difference between mediation and moderation. Moderator = the effect between X1 and Y depends on the value of X2Mediation = the relationship between an independent variable and a dependent variable via the inclusion on a third hypothetical variable, the mediator variable. When you do mediation you always have to use bootstrap, because there is an indirect effect. Complete mediation = when there is no direct effect of X on Y. Partial mediation = combination of direct effect of X on Y and indirect effect of X on Y through M. You need bootstrap because when you use at the distribution of ab, this is not normally distributed because it is a product of two variables (indirect effect). 

Steps bootstrap indirect effect

  1. Repeatedly sample from the dataset with replacement. Draw 5000 (default) new samples of N cases from the original sample, with replacement. Note: Cases can be drawn more than once or not once. 
  2. Estimate indirect effect ?? in every sub-sample: à à Y.
  3. Make a histogram of all values of ?? à Bootstrap sampling distribution of ?? (positively skewed). Or: Order all these estimations of the indirect effect.
  4. Look at the middle 95% (reading of the 2.5th and 97.5th percentile in the distribution) à 95% CI. 5. Optional: Determine the bias corrected and accelerated (BCa) interval. 

Significance

When zero is included in the confidence interval, you don’t reject the H0 hypothesis, so there is no effect. If zero is outside the confidence interval, there is an effect. In SPSS you look at BootLLCI and BootULCI. When 0 is included, you reject H0, there is an indirect effect. When you decide if the mediation is complete or partial, you look if there is a direct effect and if this is significant. 

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activity abroad, study field of working area:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Britt van Dongen
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
2895