Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)
- 2846 reads
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
In social science often two groups are compared. For quantitative variables means are compared, for categorical variables proportions. When comparing two groups, a binary variable is created: a variable with two categories (also called dichotomous). For instance for sex as a variable the results are men and women. This is an example of bivariate statistics.
Two groups can be dependent or independent. They are dependent when the respondents naturally match with each other. An example is longitudinal research, where the same group is measured in two moments in time. For an independent sample the groups don't match, for instance in cross-sectional research, where people are randomly selected from the population.
Imagine comparing two independent groups: men and women and the time they spend sleeping. Men and women are two different groups, with two population means, two estimates and two standard errors. The standard error indicates how much the mean differs for each sample. Because we want to investigate the difference, also this difference has a standard error. The population difference is estimated by the sample difference. What you want to know, is µ₂ – µ₁, this is estimated by ȳ2 – ȳ1. This can be shown in a sampling distribution. The standard error of ȳ2 – ȳ1 indicates how much the mean varies between samples. The formula is:
Estimated standard error =
In this case se1 is the standard error of group 1 (men) and se2 the standard error of group 2 (women).
Instead of the difference also the ratio can be given. This is especially useful in case of very small proportions.
The difference between the proportions of two populations (π2 – π1) is estimated by the difference between the sampling proportions. When the samples are very large, the difference is small.
The confidence interval is the point estimate of the difference ± the t-score multiplied with the standard error. The formula for the group difference is:
confidence interval in which
When the confidence interval has positive values, that means µ₂ - µ₁ is positive and µ₂ is bigger than µ₁. If the confidence interval has negative values, that means µ₂ is smaller than µ₁. When the outcome is a small confidence interval, that means the groups don't differ much.
For a significance test to compare the proportions of two groups, H0 : π2 = π1. This would mean that the proportion is exactly equal in each group. Another possible H0 is π2 – π1 = 0, which also says that there is no difference. Calculating the z-score and the P-value works in roughly the same way as for one group, but the difference is thatindicates an estimate of the proportion in both groups of the sample. This is called a pooled estimate. This estimate is the same in this case as 2 - 1. With this the standard error can be calculated. For se0, the standard error in case the null hypothesis is true, another formula is used:
se0 =
This can be calculated with software. A clear way to present results is in a contingency table. In a contingency table the categories of the explanatory variable are placed in the rows and the categories of the response variable in the columns. The cells indicate the combinations of findings.
For the two populations means (µ₂ – µ₁) a confidence interval can be calculated using the sampling distribution (ȳ2 – ȳ1), the formula is:
(ȳ2 – ȳ1 ) ± t(se) in which
The t-score is the one that fits the chosen confidence level. The degrees of freedom df are usually calculated by software. When the standard deviations and sample sizes are equal for each group, then a simplified formula for the degrees of freedom is: df = (n1 + n2 – 2). The outcome is positive or negative and indicates which of the two groups has a higher mean.
For a significance test for comparing two means, H0 : µ1 = µ2 which implicates the same as H0 : µ₂ – µ₁ = 0.
The formula is: t = in which se =
The standard error and the degrees of freedom are the same for a confidence interval for two means. Often researchers are interested in the difference between two groups. Significance tests are used less often just for one group.
Dependent samples compare matched pairs data. Longitudinal research, where the same subjects are measured at different moments in time, repeated measures are done. An example is a crossover study, in which a subject gets a treatment and later another treatment.
When matched pairs are compared, for each pair a variable is created (named yd): difference = observation in sample 2 – observation in sample 1. The sample mean is ȳd. A rule for matched pairs is that the difference between the means is the same as the mean of the different scores.
The confidence interval of µd (the difference between the means) is:
The significance test is: in which
When a significance test is performed over different observations for dependent pairs, it is called the paired difference t-test.
The advantages of dependent samples are:
Other variables influence both the first and the second sample because the same subjects are used.
The variability and the standard error are smaller.
Beside the t-test, there are other methods for comparing means. Examples are the assumption of identical standard deviations, randomized block design, effect size and a model.
For an independent sample it is assumed that null hypothesis entails that the distributions of the response variables are identical. Then the standard deviations and the means are also identical. The estimate of the standard deviation is:
The confidence interval is:
(ȳ2 – ȳ1 ) ± t(se) in which se =
The degrees of freedom are the combined numbers of observations subtracted by the number of estimated parameters (µ1 en µ2), so this is df = n1 + n2 – 2.
Another method is the randomized block design. This means that subjects that are alike, are regarded as a pair and only of the two gets treatment.
Software makes inferences about a variability that is equal for two groups but also in case equal variance is not assumed. So it can be assumed that the population standard deviations are the same (σ1 = σ2), but this isn't necessary. When the sample sizes are (nearly) the same, the results of test statistics are equal for identical variance and non-identical variance. However, when hugely different standard deviations are suspected, this method isn't appropriate. It's better not to use F, a function in software that tests whether the standard deviations are equal, because this isn't robust for non-normal distributions.
Another method is using the effect size, using the formula (ȳ1 – ȳ2) / s. The outcome is regarded big when it is 1 or bigger. This method is specifically useful if the difference would vary a lot depending on units of measure (like kilometers or miles).
Another wat to compare means, is using a model: a simple approach of the real association between two (or more) variables in the population. A normal distribution with a mean and a standard deviation is written as N(µ, σ). y1 is an observation from group 1 and y2 is an observation from group 2. A model can be:
H0 : y1 has a distribution N(µ, σ1) and y2 has a distribution N(µ, σ2)
Ha : y1 has a distribution N(µ1, σ1) and y2 has a distribution N(µ2, σ2) and µ1 ≠ µ2
This investigates whether the means differ. The standard deviations aren't assumed equal, because that would simplify reality too much, hence allowing for big mistakes.
Even for dependent or very small samples, methods exist to compare proportions. For dependent samples a z-score can be used that compares proportions, or McNemar's test, or a confidence interval. For small samples Fishers exact test applies.
The z-score measures the number of standard errors between the estimate and the value of the null hypothesis. The formula in this case is: (sample proportion – null hypothesis proportion) / standard error.
For paired proportions, McNemar's test applies. The test statistic is:
Other than a significance test also a confidence interval can research the differences between dependent proportions. The formula is:
() ± z(se) in which se =
Fishers exact test is a complex test but it can be performed with software to compare very small samples.
Parametric methods assume a certain distribution shape, like the normal distribution. Nonparametric methods don't make assumptions about distribution shape.
Nonparametric methods for comparing groups are mostly used for very small samples or very skewed distributions. Examples are the Wilcoxon test, Mann-Whitney test and nonparametric measure of effect size.
Some nonparametric tests assume that the shape of the population distributions are identical. The model for this is:
H0 : y1 and y2 have the same distribution.
Ha : The distributions of y1 and y2 have the same shape, but the distribution of y1 is skewed more upright than that of y2.
The Wilcoxon test uses an ordinal scale, it assigns ranking to the observations.
The Mann-Whitney test compares a bunch of observations of a group with a bunch of observations of another group, for instance two sets of weather forecast by different forecasting companies.
The effect size can be applied to nonparametric distributions. The observations from one group are compared to see if they are for instance higher than the observations from another group.
Another option is treating ordinal variables as quantitative variables. This gives a score to each category. This can be an easier method than treating rankings as ordinal variables.
Summary of Statistical methods for the social sciences by Agresti, 5th edition, 2018. Summary in English.
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
2822 |
Add new contribution