Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019

Week 1

 

Prep exercises

  1. Which combination of measurement levels is required for the use of the Pearson, Spearman, and point-biserial correlation respectively?
  2. Which formula is suitable for calculating the Pearson, Spearman, and point-biserial correlations?
  3. Which formula describes the relationship between rpb and tindep?
  4. Which combination of measurement levels is required for the use of the phi coefficient?
  5. What is the specific formula for calculating the phi coefficient?
  6. Which formula describes the relationship between φ and χ2?
  7. What is the formula for testing the difference between two independent correlation coefficients?
  8. What is the rule of thumb for effect size r2 and r?

 

Workgroup tips 1

Correlation is NOT causation. It is an association between variables.

Positive correlation = both increase or decrease

Negative = One increases, the other decreases

 

Pearson’s r; both variables are at an interval level. Formula: ∑ZxZy/n-1

Spearman rho = two ordinal variables (To avoid outlier influence in Pearson’s r) rs = r Important: RANK IT FIRST, then take the z scores

Point Biserial; one dichotomous and one continuous variable rpb = r

Phi is a nominal variable, that only has two levels each aka dichotomous X2 = r

 

Dichotomous means that the value can only be one of two things. For instance yes/no, male/female, left/right. It is a nominal variable, but where with a simply nominal variable answers can be red/blue/green/yellow, a dichotomous variable could in this case only be red/blue, for instance.

 

Basically, all of these correlation have the basic formula, which is ∑ZxZy/n-1

 

R is about sample, ρ is population

Parameter

Population

Sample

Mean

µ

Probability

P

p

Standard Deviation

σ

S

Correlation

Ρ

r

 

In order to see if the r you have calculated is useful for the population, use the formula

t = (r √N-2) / √(1 - r2)

 

Correlation type

Type of variables

Correlation symbol

Formula

Pearson’s r

Interval/Interval

r

∑ZxZy/n-1

Spearman’s rho

Ordinal/Ordinal

rs

∑ZxZy/n-1

Point Biserial

Dichotomous/Interval

Rbp

∑ZxZy/n-1

Phi

Dichotomous/ Dichotomous

Φ

∑ZxZy/n-1

 

 

Week 2

 

  1. What is the purpose of regression analysis?
  2. Write down the simple linear regression equation.
  3. What is the difference between a predictor and a response variable?
  4. What is the meaning of the slope and how is it calculated?
  5. What is the meaning of the intercept and how is it calculated?
  6. What is the formula for testing the slope? (+df)
  7. What is the formula for calculating the standard error of the slope?
  8. In which ways can the variance accounted for be calculated in regression analysis?

 

 

Workgroup tips 2

Difference between correlation and regression: regression predicts a value, correlation is not causation.

B0 , also called the intercept, and it is the mean of y when x is 0

B1, also called regression coefficient, is the slope.

Simple regression has only one predictor, multiple regression has multiple predictors.

When the slope increases, the regression line gets steeper.

When the intercept increases, the entire line gets lifted a bit.

Why do we test the regression coefficient?

If a question says ‘weight’ as a function of ‘height’, in which case height is  X and weight is Y.

The error of observed – predicted is called residual.

 

Week 3

 

  1. Write down the multiple linear regression equation.
  2. In which way can the variance accounted for be calculated in multiple linear regression analysis?
  3. What is the meaning of the multiple correlation coefficient R?
  4. How are the SS, df, and MS of the model determined?
  5. How are the SS, df, and MS of the error determined?
  6. How are the SS, df, and MS of the total determined?
  7. What is the null hypothesis when testing the entire model?
  8. What is the formula for testing the entire model? (+df)
  9. what is the formula for testing individual regression coefficients? (+df)
  10. How can the unique contribution of a predictor be determined?

 

Workgroup tips 3

 

For the bigger picture, look at it like this example:

A teacher wants to know how the grades of his students are composed. He supposes that intelligence is a big part of the grade of student and so draws up a correlation between the grade and the student’s intelligence (week 1, correlations). But, after calculating the correlation, he supposes that genetics and effort also have an impact on the grade. These parts, genes, intelligence and effort, are called predictors (week 2) and each influence the dependant variable – the grade. But, the teacher later supposes that while genetics and intelligence both have an influence on the grade, genetics also have an influence on intelligence (week 3). To take out these influences, you calculate unique contributions.

 

 

Week 4

Prep exercises

  1. What are the assumptions for regression analysis (+ explanation) and how can they be checked?
  2. What is multicollinearity?
  3. Which diagnostics are used to investigate the presence of outliers? (+ explanation)
  4. What is the formula for determining the confidence interval for the mean?
  5. What is the formula for determining the prediction interval for an individual observation?
  6. How are the standard errors for both intervals calculated?

 

 

Workgroup tips 4

When working in SPSS and asked to calculate any of the correlations from Week 1, simply go to Analyse -> Bivariate -> Calculate all variables with Pearson’s r

When SPSS’ scatterplot gives a proper curved line, you can assume Pearson’s r is not a correct correlation to use, because the correlation is not linear. This means that you have to use a nonparametric correlation such as Spearman’s rho.

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

95.0% Confidence Interval for B

Correlations

B

Std. Error

Beta

Lower Bound

Upper Bound

Zero-order

Partial

Part

1

(Constant)

 

 

1.163 = B0

 

 

 

 

.007

 

177.296

.000

1.150

1.176

 

 

 

lskin

-.063 = B

 

 

.004

 

-.849

-15.228

.000

-.071

-.055

-.849

-.849

-.849

a. Dependent Variable: den

 

When you want a regression line in a scatter plot, open the chart builder, go to elements and click ‘fit line to total’. In the window that opens, select ‘linear’ and choose whether you want an regression line of the mean or from individual.

 

Week 5

 

  1. Which type of variables (measurement level) are generally used in a regression analysis?
  2. In which way can categorical variables be added to a regression analysis?
  3.  What are dummy variables?
  4. What are benefits of using a categorical variable to a regression-analysis?

 

Workgroup Tips 5

When you want to run a regression analysis in SPSS, go to Analyse -> Regression -> Linear. There, your x axis is your independent and your y is the dependent.

A dummy variable in SPSS tells you the difference in intercepts (b0).

Something called and ‘interaction’ is always a multiplication.

When you want to centre a variable, you do the following steps:

  • Data, Aggregate.
  • Leave the Break Variable(s) window empty.
  • Specify your particular variable for the Summary of Variable(s) window.
  • Save the Aggregated variables to the active dataset (default setting).
  • Press OK
  • This results in a new variable containing each the mean height of the whole group.
  • Finally compute the centred variable via Transform, Compute, C VARIABLE = VARIABLE - VARIABLE MEAN.

Y^= B0(intercept) + B1 (slope) – b2 (difference in intercept) – b3 (difference in slope

 

Week 6

  1. What is the purpose of analysis of variance (ANOVA)?
  2. Which null hypothesis is being tested in ANOVA?
  3. Write down the oneway-ANOVA model and indicate what each symbol represents.
  4. What is the effect parameter α and how is it estimated?
  5. What are the assumptions for ANOVA (+ explanation) and how can they be checked?
  6. How are the SS, df, and MS of the model determined?
  7. How are the SS, df, and MS of the error determined?
  8. How are the SS, df, and MS of the total determined?
  9. What is the formula for determining the variance accounted for in the sample?
  10. What is the formula for determining the variance accounted for in the population?

 

Workgroup tips 6

ANOVA is technically a t-test on steroids, you use it to measure the type 1 error difference between different groups/variables. With ANOVA, you can claim causation, rather than correlation.

Effect parameter means that you see how far your mean is removed from the grand mean. You get the grand mean by taking the average of all means.

The closer points in a scatterplot are to the mean, the less error you have. Error = Xij – XmeanJ

A higher ANOVA result aka a higher F means a lower p and thus more significance. More significance means that there is a difference between at least two means.

 

Week 7

  1. How can the familywise error rate αfam be determined?
  2. What are contrasts?
  3. What is the formula for testing contrasts? (+df)
  4. What are post-hoc tests?
  5. Why is the Bonferroni Correction sometimes used when performing contrasts or post hoc tests?
  6. How can the Bonferroni corrected error rate be determined?

 

Workgroup tips 7

Overview of this workgroup: how to test which means are influencing the difference.

Priori test is beforehand -> clear what you are testing, BUT, you have to use a Bonferroni Correction to correct your Type 1 error (the chance that you reject H0 even though it is correct)

A Post Hoc test you perform after, and SPSS makes the correction for type 1 error for you.

Setting your contrast positive means that you give the positive score to the side that you expect the highest mean of.

All the added weights (proportions), need to add up to 0. This means that you devide your probabilities so that your mean percentages add up to +1 on the one side and -1 on the other.

Independence of Contrasts is the table to see if the contrasts are actually independent, to narrow down the clusters so you test everything a bit.

 When comparing contrast, multiply the contrasts of conditions and add them up to each other. This has to also equal 0

 

Week 8

  1. What are factors and what is a factorial design?
  2. What are the definitions of main effect and interaction effect?
  3. Write down the factorial ANOVA model and indicate what each symbol represents.
  4. How are the effect parameters α, β, and αβ estimated?
  5. What are the formulas for the proportion variance accounted for of an effect in the sample?
  6. What is the formula for the proportion variance accounted for in the population?

 

Workgroup tips 8

When working with the effect parameters α, β, and αβ, first make a contingency table that looks like this (in case of a 2x3 factorial design)

-

Condition A1

Condition A2

Condition A3

 

Condition B1

Mean Score B1xA1

Mean Score B1xA2

Mean Score B1xA3

Mean of B1

Condition B2

Mean Score B2xA1

Mean Score B2xA2

Mean Score B2xA3

Mean of B2

 

Mean of A1

Mean of A2

Mean of A3

Mean of all means

 

Hypothesis of the table would be

H0 : αβji = 0

Keep in mind that with contrasts in SPSS, you should not include the missing variable. When asked for unique or alone variance, use part/partial correlation and square it

 

 

Answers

 

Week 1

  1. Pearson’s r: interval/interval. Spearman: Ordinal/ordinal. Point Biserial: Dichotomous/interval
  2. ∑ZxZy/n-1
  3. Rpb = √t2/t2 + df
  4. Dichotomous/ Dichotomous
  5. ∑ZxZy/n-1
  6. The bigger phi is, the bigger chi-squared is
  7. Z = (r’1 – r’2) / √(1/N1-3) - (1/N2-3)
  8. r = 0.1 -0.3 – 0.5             r2 = 0.01 – 0.09 – 0.25

 

Week 2

  1. Regression enables you to estimate one interval variable from one or more others.
  2. y=b0+ b1x
  3. The predictor variable (X) and the response/criterion variable (Y). X predicts Y
  4. b1 = slope: size of the difference in y^ in x increases by 1 unit
  5. b0 = intercept/constant: predicted value of y when x = 0
  6. - b0 = y-b1x

b1 = sxy/sx2        =r * (sy/sx)

  1. Total variance: sy2= SStotal/dftotal =  Σ(y-y)2/N-1
  2. SSy = s2y * df total

 

Week 3

  1. Y^ = b0 + b1x1 + b2x2 + … + bpxp
  2. 1 – (SSe/ssy) ?
  3. Correlation between y and y^
  4. MSy^  = SSmodel/dfmodel = SSy^/dfy^ = sum of (y^-ybar)2/p
  5. MSe  = SSerror/dferror = SSe/dfe = sum of (y-y^)2/N-p-1
  6. MSy  = SStotal/dftotal = SSy/dfy = sum of (y-ybar)/N-1
  7. H0: R2 = 0
  8. F(dfy^,dfe) = MSy^/MSe
  9. T = b/SEb
  10. Removing spurious correlations?

 

Week 4

  1. Linearity: there is a linear relationship. Homoscedasticity: the variance of the residuals is equal for each predicted value. Normality: the residuals are normally distributed
  2. Two or more predictors are strongly correlated
  3. Distance: are they much higher or lower than the expected Y? Leverage: Are there outliers on the predictors? Influence: how much does the observation influence the results?
  4. µ ± t* x SE
  5. y ± t* x SE
  6. SEµ = se (1/N) + ((x – xbar)2/(N-1)s2x

SEx bar = se 1+ (1/N) + ((x – xbar)2/(N-1)s2x

 

Week 5

  1. Interval
  2. Categorical variables can be included using dummy variables.
  3. Dichotomous variables with only 0 and 1.
  4. Look at differences between groups (means) / Look for possible interaction

 

Week 6

  1. Determine if all population means are equal or if there is a difference between at least two of them.
  2. H0 : µ1 = µ2 = ….
  3. DATA (total) = FIT (between) + RESIDUAL (within) / yij = µ + αi  + ϵij
  4. α = effect parameter (µi − µ) (Check: ∑ ni αi = 0)
  5. Homogeneity of variances: Variance of residuals is same in all populations. Normality of the residuals: Residuals are normally distributed within each group (with mean 0 and standard deviation σ). Independence: Residuals are independent. Checking them: Homogeneity of variances: Use rule of thumb: largest sd / smallest sd < 2, or check if Levene’s test is significant. Normality of the residuals: Look at histogram or P-P / Q-Q plot of the residuals. Independence: No check → should be guaranteed by design
  6. ANOVA table

       Source                 Sum of Squares                              DF          Mean Square               F

       Groups                Pni (y¯i − y¯) 2 = Pniαˆ 2 i )           I − 1       SSG/DFG               MSG/

  1. Error                    P(yij − y¯i ) 2 = P(ni − 1)s 2 i )       N − I      SSE/DFE               MSE
  2. Total                    P(yij − y¯) 2 = (N − 1)s 2 y)            N − 1     SST/DFT
  3. η2 = SSG/SST
  4. ωˆ2 = SSG−(DFG×MSE) / SST+MSE

 

Week 7

  1. ANOVA F-test
  2. Contrast are planned comparisons, specified before the data is collected
  3. t = c / SEc = ∑aii / sp √ ∑a2i/ni                  df = DFE = N – I
  4. Additional tests done after significant F value to determine which means are significantly different
  5. Because it corrects type 1 error
  6. α0 = α/c

 

Week 8

  1. Factors are independent variables that consist of two or more levels (caterories). A factorial design is a research design with two or more factors.
  2. Main effect is the overall effect of each factor sepertely. Interaction effect means that the effect of one factor depends on the level of other factors.
  3. DATA = FIT + RESIDUAL -> yijk = µij  + ϵijk
  4. αi = ybar i – ybar                       βj = ybar j - ybar                 αβj i = ybar ij – (ybar + αi + βj)
  5. η2 = SSeffect / SST
  6. ω2 = (SSeffect – (DFeffect* MSE)) / SST+MSE

 

GOOD LUCK ON THE EXAM!!

Xx Emy

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Lecture notes Experimental and Correlational Research

Also interested in reading the complementary lecture notes for the course this year? Check out Noa's profile for relevant content!

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Emy
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
2264 2 1