Seminar 1: Bootstrapping (ARMS, Utrecht University)

You use bootstrapping to get rid of troublesome situations, for example distributions that are not in agreement with the assumptions causing:

  • Inaccurate standard error of parameter estimate
  • Inaccurate confidence intervals
  • Inaccurate p-values in H0 significance testing

One assumption, is that the residuals should be normally distributed. There are 5 ways of how you can address this problem:

1. Ignore the problem, claim that the MLE (maximum likelihood estimation) is robust. This is defensible if the distribution is not extreme and N is large.

2. Use normalizing transformation

3. Use robust estimators (MLR)

4. Bootstrapping

5. Bayesian estimation

What does bootstrapping do? It approximates the sampling distribution by re-sampling (with replacement) from the original sample, each size of n. This results in alternative SE estimates, 95% Cis, and p-values.

We do this because most of what we know about the ‘true’ probability distribution (the population) comes from the data. So we use the data as a proxy for the population. We draw multiple samples from this proxy, as if we sample from the population. Then we compute the statistic of interest on each of the sampled datasets and calculate the results.

The resampling size is always as big as the original sample size. This is because otherwise there would be an inaccurate standard error, since SE is also based on sample size.

The one assumption for bootstrapping is that your sample needs to be representative for the population. This is a visualization of taking bootstrap samples:

 

 

 

The mean of the bootstrap distrubution does not have to equal the mean of the sampling distribution (and this is not a problem). The spreak/variance does equal the mean, and if the sample is representative, so does the shape/skewness. 

Does bootstrapping help in the following cases?

a. Regression, when the assumptions are met --> no additional value to do bootstrapping.

b. Regression, with heteroscedasticity --> yes, for a better/unbiased estimate of SE.

c. Regressioin, with very small sample size --> no, because a very small sample is often not representative for the population. If it is representative: yes.

d. Regression, with non linearity --> no, does not fix or even indicate this.

e. Regression, with multicollinearity --> no, does not fix or even indicate this.

 

Bootstrapping the indirect effect: in every bootstrap you estimate the indirect effect to get a SE and CI. This goes as follows:

1. Repeatedly sample from dataset with replacement

2. Estimate indirect effect ab in every sub-sample: X --> M --> Y

3. Make a histogram of all values of ab

4. Look at the middle 95% --> 95% CI

5. Optional: determine the bias corrected and accelerated interval

Checking for significance: if the 0 is included in your confidence interval: do not reject H0. If it is not, reject H0 (then there is an indirect effect).

Question: is there mediation in this set-up?:

 

 

Answer: no, because the post-test takes place before the follow-up test, so therefore the time sequence does not match with mediation!

To report the bootstrapping, do it the same as regular reporting but mention bootstrapping and mention the number of bootstrap samples.

Questions? Let me know in the contribution section!

Follow me for more summaries on statistics!

Image

Access: 
Public

Image

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Image

Spotlight: topics

Image

Check how to use summaries on WorldSupporter.org
Submenu: Summaries & Activities
Follow the author: JuliaV
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
Search a summary, study help or student organization