Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019

Week 1
Workgroup tips 1
Week 2
Workgroup tips 2
Week 3
Workgroup tips 3
Week 4
Workgroup tips 4
Week 5
Workgroup Tips 5
Week 6
Workgroup tips 6
Week 7
Workgroup tips 7
Week 8
Workgroup tips 8
Answers

Week 1

Prep exercises

Which combination of measurement levels is required for the use of the Pearson, Spearman, and point-biserial correlation respectively?
Which formula is suitable for calculating the Pearson, Spearman, and point-biserial correlations?
Which formula describes the relationship between r_pband t_indep?
Which combination of measurement levels is required for the use of the phi coefficient?
What is the specific formula for calculating the phi coefficient?
Which formula describes the relationship between φ and χ²?
What is the formula for testing the difference between two independent correlation coefficients?
What is the rule of thumb for effect size r² and r?

Workgroup tips 1

Correlation is NOT causation. It is an association between variables.

Positive correlation = both increase or decrease

Negative = One increases, the other decreases

Pearson’s r; both variables are at an interval level. Formula: ∑ZxZy/n-1

Spearman rho = two ordinal variables (To avoid outlier influence in Pearson’s r) r_s = r Important: RANK IT FIRST, then take the z scores

Point Biserial; one dichotomous and one continuous variable r_pb = r

Phi is a nominal variable, that only has two levels each aka dichotomous X² = r

Dichotomous means that the value can only be one of two things. For instance yes/no, male/female, left/right. It is a nominal variable, but where with a simply nominal variable answers can be red/blue/green/yellow, a dichotomous variable could in this case only be red/blue, for instance.

Basically, all of these correlation have the basic formula, which is ∑ZxZy/n-1

R is about sample, ρ is population

Parameter	Population	Sample
Mean	µ	x̄
Probability	P	p
Standard Deviation	σ	S
Correlation	Ρ	r

In order to see if the r you have calculated is useful for the population, use the formula

t = (r √N-2) / √(1 - r²)

Correlation type	Type of variables	Correlation symbol	Formula
Pearson’s r	Interval/Interval	r	∑ZxZy/n-1
Spearman’s rho	Ordinal/Ordinal	r_s	∑ZxZy/n-1
Point Biserial	Dichotomous/Interval	R_bp	∑ZxZy/n-1
Phi	Dichotomous/ Dichotomous	Φ	∑ZxZy/n-1

Week 2

What is the purpose of regression analysis?
Write down the simple linear regression equation.
What is the difference between a predictor and a response variable?
What is the meaning of the slope and how is it calculated?
What is the meaning of the intercept and how is it calculated?
What is the formula for testing the slope? (+df)
What is the formula for calculating the standard error of the slope?
In which ways can the variance accounted for be calculated in regression analysis?

Workgroup tips 2

Difference between correlation and regression: regression predicts a value, correlation is not causation.

B₀, also called the intercept, and it is the mean of y when x is 0

B₁, also called regression coefficient, is the slope.

Simple regression has only one predictor, multiple regression has multiple predictors.

When the slope increases, the regression line gets steeper.

When the intercept increases, the entire line gets lifted a bit.

Why do we test the regression coefficient?

If a question says ‘weight’ as a function of ‘height’, in which case height is X and weight is Y.

The error of observed – predicted is called residual.

Week 3

Write down the multiple linear regression equation.
In which way can the variance accounted for be calculated in multiple linear regression analysis?
What is the meaning of the multiple correlation coefficient R?
How are the SS, df, and MS of the model determined?
How are the SS, df, and MS of the error determined?
How are the SS, df, and MS of the total determined?
What is the null hypothesis when testing the entire model?
What is the formula for testing the entire model? (+df)
what is the formula for testing individual regression coefficients? (+df)
How can the unique contribution of a predictor be determined?

Workgroup tips 3

For the bigger picture, look at it like this example:

A teacher wants to know how the grades of his students are composed. He supposes that intelligence is a big part of the grade of student and so draws up a correlation between the grade and the student’s intelligence (week 1, correlations). But, after calculating the correlation, he supposes that genetics and effort also have an impact on the grade. These parts, genes, intelligence and effort, are called predictors (week 2) and each influence the dependant variable – the grade. But, the teacher later supposes that while genetics and intelligence both have an influence on the grade, genetics also have an influence on intelligence (week 3). To take out these influences, you calculate unique contributions.

Week 4

Prep exercises

What are the assumptions for regression analysis (+ explanation) and how can they be checked?
What is multicollinearity?
Which diagnostics are used to investigate the presence of outliers? (+ explanation)
What is the formula for determining the confidence interval for the mean?
What is the formula for determining the prediction interval for an individual observation?
How are the standard errors for both intervals calculated?

Workgroup tips 4

When working in SPSS and asked to calculate any of the correlations from Week 1, simply go to Analyse -> Bivariate -> Calculate all variables with Pearson’s r

When SPSS’ scatterplot gives a proper curved line, you can assume Pearson’s r is not a correct correlation to use, because the correlation is not linear. This means that you have to use a nonparametric correlation such as Spearman’s rho.

Coefficients^a

Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

95.0% Confidence Interval for B

Correlations

Std. Error

Beta

Lower Bound

Upper Bound

Zero-order

Partial

Part

(Constant)

1.163 = B₀

.007

177.296

.000

1.150

1.176

lskin

-.063 = B₁

.004

-.849

-15.228

.000

-.071

-.055

-.849

a. Dependent Variable: den

When you want a regression line in a scatter plot, open the chart builder, go to elements and click ‘fit line to total’. In the window that opens, select ‘linear’ and choose whether you want an regression line of the mean or from individual.

Week 5

Which type of variables (measurement level) are generally used in a regression analysis?
In which way can categorical variables be added to a regression analysis?
What are dummy variables?
What are benefits of using a categorical variable to a regression-analysis?

Workgroup Tips 5

When you want to run a regression analysis in SPSS, go to Analyse -> Regression -> Linear. There, your x axis is your independent and your y is the dependent.

A dummy variable in SPSS tells you the difference in intercepts (b₀).

Something called and ‘interaction’ is always a multiplication.

When you want to centre a variable, you do the following steps:

Data, Aggregate.
Leave the Break Variable(s) window empty.
Specify your particular variable for the Summary of Variable(s) window.
Save the Aggregated variables to the active dataset (default setting).
Press OK
This results in a new variable containing each the mean height of the whole group.
Finally compute the centred variable via Transform, Compute, C VARIABLE = VARIABLE - VARIABLE MEAN.

Y^= B₀(intercept) + B₁(slope) – b₂(difference in intercept) – b₃(difference in slope

Week 6

What is the purpose of analysis of variance (ANOVA)?
Which null hypothesis is being tested in ANOVA?
Write down the oneway-ANOVA model and indicate what each symbol represents.
What is the effect parameter α and how is it estimated?
What are the assumptions for ANOVA (+ explanation) and how can they be checked?
How are the SS, df, and MS of the model determined?
How are the SS, df, and MS of the error determined?
How are the SS, df, and MS of the total determined?
What is the formula for determining the variance accounted for in the sample?
What is the formula for determining the variance accounted for in the population?

Workgroup tips 6

ANOVA is technically a t-test on steroids, you use it to measure the type 1 error difference between different groups/variables. With ANOVA, you can claim causation, rather than correlation.

Effect parameter means that you see how far your mean is removed from the grand mean. You get the grand mean by taking the average of all means.

The closer points in a scatterplot are to the mean, the less error you have. Error = X_ij– X_meanJ

A higher ANOVA result aka a higher F means a lower p and thus more significance. More significance means that there is a difference between at least two means.

Week 7

How can the familywise error rate α_fambe determined?
What are contrasts?
What is the formula for testing contrasts? (+df)
What are post-hoc tests?
Why is the Bonferroni Correction sometimes used when performing contrasts or post hoc tests?
How can the Bonferroni corrected error rate be determined?

Workgroup tips 7

Overview of this workgroup: how to test which means are influencing the difference.

Priori test is beforehand -> clear what you are testing, BUT, you have to use a Bonferroni Correction to correct your Type 1 error (the chance that you reject H₀even though it is correct)

A Post Hoc test you perform after, and SPSS makes the correction for type 1 error for you.

Setting your contrast positive means that you give the positive score to the side that you expect the highest mean of.

All the added weights (proportions), need to add up to 0. This means that you devide your probabilities so that your mean percentages add up to +1 on the one side and -1 on the other.

Independence of Contrasts is the table to see if the contrasts are actually independent, to narrow down the clusters so you test everything a bit.

When comparing contrast, multiply the contrasts of conditions and add them up to each other. This has to also equal 0

Week 8

What are factors and what is a factorial design?
What are the definitions of main effect and interaction effect?
Write down the factorial ANOVA model and indicate what each symbol represents.
How are the effect parameters α, β, and αβ estimated?
What are the formulas for the proportion variance accounted for of an effect in the sample?
What is the formula for the proportion variance accounted for in the population?

Workgroup tips 8

When working with the effect parameters α, β, and αβ, first make a contingency table that looks like this (in case of a 2x3 factorial design)

-	Condition A1	Condition A2	Condition A3
Condition B1	Mean Score B1xA1	Mean Score B1xA2	Mean Score B1xA3	Mean of B1
Condition B2	Mean Score B2xA1	Mean Score B2xA2	Mean Score B2xA3	Mean of B2
	Mean of A1	Mean of A2	Mean of A3	Mean of all means

Hypothesis of the table would be

H₀: αβ_ji = 0

Keep in mind that with contrasts in SPSS, you should not include the missing variable. When asked for unique or alone variance, use part/partial correlation and square it

Answers

Week 1

Pearson’s r: interval/interval. Spearman: Ordinal/ordinal. Point Biserial: Dichotomous/interval
∑ZxZy/n-1
R_pb= √t²/t² + df
Dichotomous/ Dichotomous
∑ZxZy/n-1
The bigger phi is, the bigger chi-squared is
Z = (r’₁ – r’₂) / √(1/N₁-3) - (1/N₂-3)
r = 0.1 -0.3 – 0.5 r² = 0.01 – 0.09 – 0.25

Week 2

Regression enables you to estimate one interval variable from one or more others.
y=b0+ b1x
The predictor variable (X) and the response/criterion variable (Y). X predicts Y
b1 = slope: size of the difference in y^ in x increases by 1 unit
b0 = intercept/constant: predicted value of y when x = 0
- b₀= y-b1x

b₁ = sxy/sx2 =r * (sy/sx)

Total variance: sy2= SS_total/df_total = Σ(y-y)²/N-1
SSy = s2y * df _total

Week 3

Y^ = b₀+ b_1x1 + b_2x2 + … + b_pxp
1 – (SS_e/ss_y) ?
Correlation between y and y^
MS_y^ = SS_model/df_model = SS_y^/df_y^ = sum of (y^-y_bar)²/p
MS_e = SS_error/df_error = SS_e/df_e = sum of (y-y^)²/N-p-1
MS_y = SS_total/df_total = SS_y/df_y = sum of (y-y_bar)/N-1
H₀: R² = 0
F(df_y^,df_e) = MS_y^/MS_e
T = b/SE_b
Removing spurious correlations?

Week 4

Linearity: there is a linear relationship. Homoscedasticity: the variance of the residuals is equal for each predicted value. Normality: the residuals are normally distributed
Two or more predictors are strongly correlated
Distance: are they much higher or lower than the expected Y? Leverage: Are there outliers on the predictors? Influence: how much does the observation influence the results?
µ ± t* x SE
y ± t* x SE
SE_µ = s_e (1/N) + ((x – x_bar)²/(N-1)s²_x

SE_{x bar} = s_e 1+ (1/N) + ((x – x_bar)²/(N-1)s²_x

Week 5

Interval
Categorical variables can be included using dummy variables.
Dichotomous variables with only 0 and 1.
Look at differences between groups (means) / Look for possible interaction

Week 6

Determine if all population means are equal or if there is a difference between at least two of them.
H₀ : µ₁= µ₂= ….
DATA (total) = FIT (between) + RESIDUAL (within) / y_ij = µ + α_i+ ϵ_ij
α = effect parameter (µ_i− µ) (Check: ∑ n_i α_{i =} 0)
Homogeneity of variances: Variance of residuals is same in all populations. Normality of the residuals: Residuals are normally distributed within each group (with mean 0 and standard deviation σ). Independence: Residuals are independent. Checking them: Homogeneity of variances: Use rule of thumb: largest sd / smallest sd < 2, or check if Levene’s test is significant. Normality of the residuals: Look at histogram or P-P / Q-Q plot of the residuals. Independence: No check → should be guaranteed by design
ANOVA table

Source Sum of Squares DF Mean Square F

Groups Pni (y¯i − y¯) 2 = Pniαˆ 2 i ) I − 1 SSG/DFG MSG/

Error P(yij − y¯i ) 2 = P(ni − 1)s 2 i ) N − I SSE/DFE MSE
Total P(yij − y¯) 2 = (N − 1)s 2 y) N − 1 SST/DFT
η² = SSG/SST
ωˆ² = SSG−(DFG×MSE) / SST+MSE

Week 7

ANOVA F-test
Contrast are planned comparisons, specified before the data is collected
t = c / SE_c= ∑a_i x¯_i / s_p √ ∑a²_i/n_i df = DFE = N – I
Additional tests done after significant F value to determine which means are significantly different
Because it corrects type 1 error
α₀ = α/c

Week 8

Factors are independent variables that consist of two or more levels (caterories). A factorial design is a research design with two or more factors.
Main effect is the overall effect of each factor sepertely. Interaction effect means that the effect of one factor depends on the level of other factors.
DATA = FIT + RESIDUAL -> y_ijk = µ_ij+ ϵ_ijk
α_i = y_{bar i} – y_bar β_j = y_{bar j} - y_bar αβ_{j i} = y_{bar ij} – (y_bar+α_i + β_j)
η²= SS_effect/ SST
ω²= (SS_effect – (DF_effect* MSE)) / SST+MSE

GOOD LUCK ON THE EXAM!!

Xx Emy

Access:

Public

Click & Go to more related summaries or chapters

Study Guide with lecture notes for Experimental and Correlational Research at the Leiden University

Lecture notes with Experimental and Correlational Research at the Leiden University - 2018/2019

Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019

Psychology Leiden: summaries and study notes - Theme

Study guide with lecture notes for Psychology Bachelor 1 at Leiden University

Workgroup notes with Personality, Clinical and Health Psychology at the Leiden University - 2018/2019

Workgroup notes with Inferential Statistics at the Leiden University - 2018/2019

Workgroup Answers with Inferential Statistics at the Leiden University - 2018/2019

Lecture notes with Experimental and Correlational Research at the Leiden University - 2018/2019

Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019

Lecture notes with Social and Organisation psychology at the Leiden University - 2018/2019

Lecture notes with Developmental and Educational Psychology at the Leiden University - 2018/2019

Psychology Leiden: summaries and study notes - Theme

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: Emy

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019

Week 1

Workgroup tips 1

Week 2

Workgroup tips 2

Week 3

Workgroup tips 3

Week 4

Workgroup tips 4

Week 5

Workgroup Tips 5

Week 6

Workgroup tips 6

Week 7

Workgroup tips 7

Week 8

Workgroup tips 8

Answers

Study Guide with lecture notes for Experimental and Correlational Research at the Leiden University

Study guide with lecture notes for Psychology Bachelor 1 at Leiden University

Contributions: posts

Exam Tips JulitaBonita contributed on 07-03-2019 11:06

Lecture notes Experimental and Correlational Research Psychology Supporter contributed on 28-03-2019 16:37

Add new contribution

Spotlight: topics

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance