Workgroup notes with Experimental & Correlational Statistics at the Leiden University - 2018/2019
Week 1
Prep exercises
- Which combination of measurement levels is required for the use of the Pearson, Spearman, and point-biserial correlation respectively?
- Which formula is suitable for calculating the Pearson, Spearman, and point-biserial correlations?
- Which formula describes the relationship between rpb and tindep?
- Which combination of measurement levels is required for the use of the phi coefficient?
- What is the specific formula for calculating the phi coefficient?
- Which formula describes the relationship between φ and χ2?
- What is the formula for testing the difference between two independent correlation coefficients?
- What is the rule of thumb for effect size r2 and r?
Workgroup tips 1
Correlation is NOT causation. It is an association between variables.
Positive correlation = both increase or decrease
Negative = One increases, the other decreases
Pearson’s r; both variables are at an interval level. Formula: ∑ZxZy/n-1
Spearman rho = two ordinal variables (To avoid outlier influence in Pearson’s r) rs = r Important: RANK IT FIRST, then take the z scores
Point Biserial; one dichotomous and one continuous variable rpb = r
Phi is a nominal variable, that only has two levels each aka dichotomous X2 = r
Dichotomous means that the value can only be one of two things. For instance yes/no, male/female, left/right. It is a nominal variable, but where with a simply nominal variable answers can be red/blue/green/yellow, a dichotomous variable could in this case only be red/blue, for instance.
Basically, all of these correlation have the basic formula, which is ∑ZxZy/n-1
R is about sample, ρ is population
Parameter | Population | Sample |
Mean | µ | x̄ |
Probability | P | p |
Standard Deviation | σ | S |
Correlation | Ρ | r |
In order to see if the r you have calculated is useful for the population, use the formula
t = (r √N-2) / √(1 - r2)
Correlation type | Type of variables | Correlation symbol | Formula |
Pearson’s r | Interval/Interval | r | ∑ZxZy/n-1 |
Spearman’s rho | Ordinal/Ordinal | rs | ∑ZxZy/n-1 |
Point Biserial | Dichotomous/Interval | Rbp | ∑ZxZy/n-1 |
Phi | Dichotomous/ Dichotomous | Φ | ∑ZxZy/n-1 |
Week 2
- What is the purpose of regression analysis?
- Write down the simple linear regression equation.
- What is the difference between a predictor and a response variable?
- What is the meaning of the slope and how is it calculated?
- What is the meaning of the intercept and how is it calculated?
- What is the formula for testing the slope? (+df)
- What is the formula for calculating the standard error of the slope?
- In which ways can the variance accounted for be calculated in regression analysis?
Workgroup tips 2
Difference between correlation and regression: regression predicts a value, correlation is not causation.
B0 , also called the intercept, and it is the mean of y when x is 0
B1, also called regression coefficient, is the slope.
Simple regression has only one predictor, multiple regression has multiple predictors.
When the slope increases, the regression line gets steeper.
When the intercept increases, the entire line gets lifted a bit.
Why do we test the regression coefficient?
If a question says ‘weight’ as a function of ‘height’, in which case height is X and weight is Y.
The error of observed – predicted is called residual.
Week 3
- Write down the multiple linear regression equation.
- In which way can the variance accounted for be calculated in multiple linear regression analysis?
- What is the meaning of the multiple correlation coefficient R?
- How are the SS, df, and MS of the model determined?
- How are the SS, df, and MS of the error determined?
- How are the SS, df, and MS of the total determined?
- What is the null hypothesis when testing the entire model?
- What is the formula for testing the entire model? (+df)
- what is the formula for testing individual regression coefficients? (+df)
- How can the unique contribution of a predictor be determined?
Workgroup tips 3
For the bigger picture, look at it like this example:
A teacher wants to know how the grades of his students are composed. He supposes that intelligence is a big part of the grade of student and so draws up a correlation between the grade and the student’s intelligence (week 1, correlations). But, after calculating the correlation, he supposes that genetics and effort also have an impact on the grade. These parts, genes, intelligence and effort, are called predictors (week 2) and each influence the dependant variable – the grade. But, the teacher later supposes that while genetics and intelligence both have an influence on the grade, genetics also have an influence on intelligence (week 3). To take out these influences, you calculate unique contributions.
Week 4
Prep exercises
- What are the assumptions for regression analysis (+ explanation) and how can they be checked?
- What is multicollinearity?
- Which diagnostics are used to investigate the presence of outliers? (+ explanation)
- What is the formula for determining the confidence interval for the mean?
- What is the formula for determining the prediction interval for an individual observation?
- How are the standard errors for both intervals calculated?
Workgroup tips 4
When working in SPSS and asked to calculate any of the correlations from Week 1, simply go to Analyse -> Bivariate -> Calculate all variables with Pearson’s r
When SPSS’ scatterplot gives a proper curved line, you can assume Pearson’s r is not a correct correlation to use, because the correlation is not linear. This means that you have to use a nonparametric correlation such as Spearman’s rho.
Coefficientsa | |||||||||||||||
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | 95.0% Confidence Interval for B | Correlations | |||||||||
B | Std. Error | Beta | Lower Bound | Upper Bound | Zero-order | Partial | Part | ||||||||
1 | (Constant) |
1.163 = B0
| .007 |
| 177.296 | .000 | 1.150 | 1.176 |
|
|
| ||||
lskin | -.063 = B1 |
.004
| -.849 | -15.228 | .000 | -.071 | -.055 | -.849 | -.849 | -.849 | |||||
a. Dependent Variable: den |
When you want a regression line in a scatter plot, open the chart builder, go to elements and click ‘fit line to total’. In the window that opens, select ‘linear’ and choose whether you want an regression line of the mean or from individual.
Week 5
- Which type of variables (measurement level) are generally used in a regression analysis?
- In which way can categorical variables be added to a regression analysis?
- What are dummy variables?
- What are benefits of using a categorical variable to a regression-analysis?
Workgroup Tips 5
When you want to run a regression analysis in SPSS, go to Analyse -> Regression -> Linear. There, your x axis is your independent and your y is the dependent.
A dummy variable in SPSS tells you the difference in intercepts (b0).
Something called and ‘interaction’ is always a multiplication.
When you want to centre a variable, you do the following steps:
- Data, Aggregate.
- Leave the Break Variable(s) window empty.
- Specify your particular variable for the Summary of Variable(s) window.
- Save the Aggregated variables to the active dataset (default setting).
- Press OK
- This results in a new variable containing each the mean height of the whole group.
- Finally compute the centred variable via Transform, Compute, C VARIABLE = VARIABLE - VARIABLE MEAN.
Y^= B0(intercept) + B1 (slope) – b2 (difference in intercept) – b3 (difference in slope
Week 6
- What is the purpose of analysis of variance (ANOVA)?
- Which null hypothesis is being tested in ANOVA?
- Write down the oneway-ANOVA model and indicate what each symbol represents.
- What is the effect parameter α and how is it estimated?
- What are the assumptions for ANOVA (+ explanation) and how can they be checked?
- How are the SS, df, and MS of the model determined?
- How are the SS, df, and MS of the error determined?
- How are the SS, df, and MS of the total determined?
- What is the formula for determining the variance accounted for in the sample?
- What is the formula for determining the variance accounted for in the population?
Workgroup tips 6
ANOVA is technically a t-test on steroids, you use it to measure the type 1 error difference between different groups/variables. With ANOVA, you can claim causation, rather than correlation.
Effect parameter means that you see how far your mean is removed from the grand mean. You get the grand mean by taking the average of all means.
The closer points in a scatterplot are to the mean, the less error you have. Error = Xij – XmeanJ
A higher ANOVA result aka a higher F means a lower p and thus more significance. More significance means that there is a difference between at least two means.
Week 7
- How can the familywise error rate αfam be determined?
- What are contrasts?
- What is the formula for testing contrasts? (+df)
- What are post-hoc tests?
- Why is the Bonferroni Correction sometimes used when performing contrasts or post hoc tests?
- How can the Bonferroni corrected error rate be determined?
Workgroup tips 7
Overview of this workgroup: how to test which means are influencing the difference.
Priori test is beforehand -> clear what you are testing, BUT, you have to use a Bonferroni Correction to correct your Type 1 error (the chance that you reject H0 even though it is correct)
A Post Hoc test you perform after, and SPSS makes the correction for type 1 error for you.
Setting your contrast positive means that you give the positive score to the side that you expect the highest mean of.
All the added weights (proportions), need to add up to 0. This means that you devide your probabilities so that your mean percentages add up to +1 on the one side and -1 on the other.
Independence of Contrasts is the table to see if the contrasts are actually independent, to narrow down the clusters so you test everything a bit.
When comparing contrast, multiply the contrasts of conditions and add them up to each other. This has to also equal 0
Week 8
- What are factors and what is a factorial design?
- What are the definitions of main effect and interaction effect?
- Write down the factorial ANOVA model and indicate what each symbol represents.
- How are the effect parameters α, β, and αβ estimated?
- What are the formulas for the proportion variance accounted for of an effect in the sample?
- What is the formula for the proportion variance accounted for in the population?
Workgroup tips 8
When working with the effect parameters α, β, and αβ, first make a contingency table that looks like this (in case of a 2x3 factorial design)
- | Condition A1 | Condition A2 | Condition A3 |
|
Condition B1 | Mean Score B1xA1 | Mean Score B1xA2 | Mean Score B1xA3 | Mean of B1 |
Condition B2 | Mean Score B2xA1 | Mean Score B2xA2 | Mean Score B2xA3 | Mean of B2 |
| Mean of A1 | Mean of A2 | Mean of A3 | Mean of all means |
Hypothesis of the table would be
H0 : αβji = 0
Keep in mind that with contrasts in SPSS, you should not include the missing variable. When asked for unique or alone variance, use part/partial correlation and square it
Answers
Week 1
- Pearson’s r: interval/interval. Spearman: Ordinal/ordinal. Point Biserial: Dichotomous/interval
- ∑ZxZy/n-1
- Rpb = √t2/t2 + df
- Dichotomous/ Dichotomous
- ∑ZxZy/n-1
- The bigger phi is, the bigger chi-squared is
- Z = (r’1 – r’2) / √(1/N1-3) - (1/N2-3)
- r = 0.1 -0.3 – 0.5 r2 = 0.01 – 0.09 – 0.25
Week 2
- Regression enables you to estimate one interval variable from one or more others.
- y=b0+ b1x
- The predictor variable (X) and the response/criterion variable (Y). X predicts Y
- b1 = slope: size of the difference in y^ in x increases by 1 unit
- b0 = intercept/constant: predicted value of y when x = 0
- - b0 = y-b1x
b1 = sxy/sx2 =r * (sy/sx)
- Total variance: sy2= SStotal/dftotal = Σ(y-y)2/N-1
- SSy = s2y * df total
Week 3
- Y^ = b0 + b1x1 + b2x2 + … + bpxp
- 1 – (SSe/ssy) ?
- Correlation between y and y^
- MSy^ = SSmodel/dfmodel = SSy^/dfy^ = sum of (y^-ybar)2/p
- MSe = SSerror/dferror = SSe/dfe = sum of (y-y^)2/N-p-1
- MSy = SStotal/dftotal = SSy/dfy = sum of (y-ybar)/N-1
- H0: R2 = 0
- F(dfy^,dfe) = MSy^/MSe
- T = b/SEb
- Removing spurious correlations?
Week 4
- Linearity: there is a linear relationship. Homoscedasticity: the variance of the residuals is equal for each predicted value. Normality: the residuals are normally distributed
- Two or more predictors are strongly correlated
- Distance: are they much higher or lower than the expected Y? Leverage: Are there outliers on the predictors? Influence: how much does the observation influence the results?
- µ ± t* x SE
- y ± t* x SE
- SEµ = se (1/N) + ((x – xbar)2/(N-1)s2x
SEx bar = se 1+ (1/N) + ((x – xbar)2/(N-1)s2x
Week 5
- Interval
- Categorical variables can be included using dummy variables.
- Dichotomous variables with only 0 and 1.
- Look at differences between groups (means) / Look for possible interaction
Week 6
- Determine if all population means are equal or if there is a difference between at least two of them.
- H0 : µ1 = µ2 = ….
- DATA (total) = FIT (between) + RESIDUAL (within) / yij = µ + αi + ϵij
- α = effect parameter (µi − µ) (Check: ∑ ni αi = 0)
- Homogeneity of variances: Variance of residuals is same in all populations. Normality of the residuals: Residuals are normally distributed within each group (with mean 0 and standard deviation σ). Independence: Residuals are independent. Checking them: Homogeneity of variances: Use rule of thumb: largest sd / smallest sd < 2, or check if Levene’s test is significant. Normality of the residuals: Look at histogram or P-P / Q-Q plot of the residuals. Independence: No check → should be guaranteed by design
- ANOVA table
Source Sum of Squares DF Mean Square F
Groups Pni (y¯i − y¯) 2 = Pniαˆ 2 i ) I − 1 SSG/DFG MSG/
- Error P(yij − y¯i ) 2 = P(ni − 1)s 2 i ) N − I SSE/DFE MSE
- Total P(yij − y¯) 2 = (N − 1)s 2 y) N − 1 SST/DFT
- η2 = SSG/SST
- ωˆ2 = SSG−(DFG×MSE) / SST+MSE
Week 7
- ANOVA F-test
- Contrast are planned comparisons, specified before the data is collected
- t = c / SEc = ∑ai x¯i / sp √ ∑a2i/ni df = DFE = N – I
- Additional tests done after significant F value to determine which means are significantly different
- Because it corrects type 1 error
- α0 = α/c
Week 8
- Factors are independent variables that consist of two or more levels (caterories). A factorial design is a research design with two or more factors.
- Main effect is the overall effect of each factor sepertely. Interaction effect means that the effect of one factor depends on the level of other factors.
- DATA = FIT + RESIDUAL -> yijk = µij + ϵijk
- αi = ybar i – ybar βj = ybar j - ybar αβj i = ybar ij – (ybar + αi + βj)
- η2 = SSeffect / SST
- ω2 = (SSeffect – (DFeffect* MSE)) / SST+MSE
GOOD LUCK ON THE EXAM!!
Xx Emy
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Contributions: posts
Spotlight: topics
Online access to all summaries, study notes en practice exams
- Check out: Register with JoHo WorldSupporter: starting page (EN)
- Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)
How and why use WorldSupporter.org for your summaries and study assistance?
- For free use of many of the summaries and study aids provided or collected by your fellow students.
- For free use of many of the lecture and study group notes, exam questions and practice questions.
- For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
- For compiling your own materials and contributions with relevant study help
- For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
- Use the summaries home pages for your study or field of study
- Use the check and search pages for summaries and study aids by field of study, subject or faculty
- Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
- Check or follow authors or other WorldSupporters
- Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
- Check out: Why and how to add a WorldSupporter contributions
- JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
- Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form
Quicklinks to fields of study for summaries and study assistance
Main summaries home pages:
- Business organization and economics - Communication and marketing -International relations and international organizations - IT, logistics and technology - Law and administration - Leisure, sports and tourism - Medicine and healthcare - Pedagogy and educational science - Psychology and behavioral sciences - Society, culture and arts - Statistics and research
- Summaries: the best textbooks summarized per field of study
- Summaries: the best scientific articles summarized per field of study
- Summaries: the best definitions, descriptions and lists of terms per field of study
- Exams: home page for exams, exam tips and study tips
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
- Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
- Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden , Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
2267 | 2 | 1 |
Exam Tips JulitaBonita contributed on 07-03-2019 11:06
A now 2nd year IBP student, shared her exam tips for this course last year. Check out the relevant content > IBP Leiden-Experimental and Correlational Research. You can also follow Ilona's profile for more summaries, blogs and lecture notes.
Lecture notes Experimental and Correlational Research Psychology Supporter contributed on 28-03-2019 16:37
Also interested in reading the complementary lecture notes for the course this year? Check out Noa's profile for relevant content!
Add new contribution