How do you compare two groups in statistics? - Chapter 7

7.1 What are the basic rules for comparing two groups?
7.2 How do you compare two proportions of categorical data?
7.3 How do you compare two means of quantitative data?
7.4 How do you compare the means of dependent samples?
7.5 Which complex methods can be used for comparing means?
7.6 Which complex methods can be used for comparing proportions?
7.7 Which nonparametric methods exist for comparing groups?

7.1 What are the basic rules for comparing two groups?

In social science often two groups are compared. For quantitative variables means are compared, for categorical variables proportions. When comparing two groups, a binary variable is created: a variable with two categories (also called dichotomous). For instance for sex as a variable the results are men and women. This is an example of bivariate statistics.

Two groups can be dependent or independent. They are dependent when the respondents naturally match with each other. An example is longitudinal research, where the same group is measured in two moments in time. For an independent sample the groups don't match, for instance in cross-sectional research, where people are randomly selected from the population.

Imagine comparing two independent groups: men and women and the time they spend sleeping. Men and women are two different groups, with two population means, two estimates and two standard errors. The standard error indicates how much the mean differs for each sample. Because we want to investigate the difference, also this difference has a standard error. The population difference is estimated by the sample difference. What you want to know, is µ₂ – µ₁, this is estimated by ȳ₂ – ȳ₁. This can be shown in a sampling distribution. The standard error of ȳ₂ – ȳ₁ indicates how much the mean varies between samples. The formula is:

Estimated standard error = $\sqrt{(se_1)^2+(se_2)^2}$

In this case se₁ is the standard error of group 1 (men) and se₂ the standard error of group 2 (women).

Instead of the difference also the ratio can be given. This is especially useful in case of very small proportions.

7.2 How do you compare two proportions of categorical data?

The difference between the proportions of two populations (π₂ – π₁) is estimated by the difference between the sampling proportions. When the samples are very large, the difference is small.

The confidence interval is the point estimate of the difference ± the t-score multiplied with the standard error. The formula for the group difference is:

confidence interval $(\hat{\pi}_2 -\hat{\pi}_1)\pm z(se)$ in which $se = \sqrt{\frac{(\hat{\pi}_1(1-\hat{\pi}_1)}{n_1}+\frac{(\hat{\pi}_2(1-\hat{\pi}_2)}{n_2}}$

When the confidence interval has positive values, that means µ₂ - µ₁ is positive and µ₂ is bigger than µ₁. If the confidence interval has negative values, that means µ₂ is smaller than µ₁. When the outcome is a small confidence interval, that means the groups don't differ much.

For a significance test to compare the proportions of two groups, H₀ : π₂ = π₁. This would mean that the proportion is exactly equal in each group. Another possible H₀ is π₂ – π₁ = 0, which also says that there is no difference. Calculating the z-score and the P-value works in roughly the same way as for one group, but the difference is thatindicates an estimate of the proportion in both groups of the sample. This is called a pooled estimate. This estimate $\hat{\pi}$ is the same in this case as $\hat{\pi}$ ₂ - $\hat{\pi}$ ₁. With this the standard error can be calculated. For se₀, the standard error in case the null hypothesis is true, another formula is used:

se₀ = $\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n_1}+\frac{\hat{\pi}(1-\hat{\pi})}{n_2}}$

This can be calculated with software. A clear way to present results is in a contingency table. In a contingency table the categories of the explanatory variable are placed in the rows and the categories of the response variable in the columns. The cells indicate the combinations of findings.

7.3 How do you compare two means of quantitative data?

For the two populations means (µ₂ – µ₁) a confidence interval can be calculated using the sampling distribution (ȳ₂ – ȳ₁), the formula is:

(ȳ₂ – ȳ₁) ± t(se) in which $se = \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$

The t-score is the one that fits the chosen confidence level. The degrees of freedom df are usually calculated by software. When the standard deviations and sample sizes are equal for each group, then a simplified formula for the degrees of freedom is: df = (n₁ + n₂ – 2). The outcome is positive or negative and indicates which of the two groups has a higher mean.

For a significance test for comparing two means, H₀: µ₁ = µ₂ which implicates the same as H₀ : µ₂ – µ₁ = 0.

The formula is: t = $\frac{(\bar{y}_2-\bar{y}_1)-0}{se}$ in which se = $se = \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$

The standard error and the degrees of freedom are the same for a confidence interval for two means. Often researchers are interested in the difference between two groups. Significance tests are used less often just for one group.

7.4 How do you compare the means of dependent samples?

Dependent samples compare matched pairs data. Longitudinal research, where the same subjects are measured at different moments in time, repeated measures are done. An example is a crossover study, in which a subject gets a treatment and later another treatment.

When matched pairs are compared, for each pair a variable is created (named y_d): difference = observation in sample 2 – observation in sample 1. The sample mean is ȳ_d. A rule for matched pairs is that the difference between the means is the same as the mean of the different scores.

The confidence interval of µ_d (the difference between the means) is:

$\bar{y}_d\pm{t\frac{s_d}{\sqrt{n}}}$

The significance test is: $t = \frac{{\bar{y}_d-0}}{se}$ in which $se = \frac{s_d}{\sqrt{n}}$

When a significance test is performed over different observations for dependent pairs, it is called the paired difference t-test.

The advantages of dependent samples are:

Other variables influence both the first and the second sample because the same subjects are used.
The variability and the standard error are smaller.

7.5 Which complex methods can be used for comparing means?

Beside the t-test, there are other methods for comparing means. Examples are the assumption of identical standard deviations, randomized block design, effect size and a model.

For an independent sample it is assumed that null hypothesis entails that the distributions of the response variables are identical. Then the standard deviations and the means are also identical. The estimate of the standard deviation is:

$s = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}$

The confidence interval is:

(ȳ₂ – ȳ₁) ± t(se) in which se = $\sqrt{\frac{s^2}{n_1}+\frac{s^2}{n_2}}$

The degrees of freedom are the combined numbers of observations subtracted by the number of estimated parameters (µ₁ en µ₂), so this is df = n₁ + n₂ – 2.

Another method is the randomized block design. This means that subjects that are alike, are regarded as a pair and only of the two gets treatment.

Software makes inferences about a variability that is equal for two groups but also in case equal variance is not assumed. So it can be assumed that the population standard deviations are the same (σ₁ = σ₂), but this isn't necessary. When the sample sizes are (nearly) the same, the results of test statistics are equal for identical variance and non-identical variance. However, when hugely different standard deviations are suspected, this method isn't appropriate. It's better not to use F, a function in software that tests whether the standard deviations are equal, because this isn't robust for non-normal distributions.

Another method is using the effect size, using the formula (ȳ₁ – ȳ₂) / s. The outcome is regarded big when it is 1 or bigger. This method is specifically useful if the difference would vary a lot depending on units of measure (like kilometers or miles).

Another wat to compare means, is using a model: a simple approach of the real association between two (or more) variables in the population. A normal distribution with a mean and a standard deviation is written as N(µ, σ). y₁ is an observation from group 1 and y₂ is an observation from group 2. A model can be:

H₀ : y₁ has a distribution N(µ, σ₁) and y₂ has a distribution N(µ, σ₂)

H_a : y₁ has a distribution N(µ₁, σ₁) and y₂ has a distribution N(µ₂, σ₂) and µ₁≠ µ₂

This investigates whether the means differ. The standard deviations aren't assumed equal, because that would simplify reality too much, hence allowing for big mistakes.

7.6 Which complex methods can be used for comparing proportions?

Even for dependent or very small samples, methods exist to compare proportions. For dependent samples a z-score can be used that compares proportions, or McNemar's test, or a confidence interval. For small samples Fishers exact test applies.

The z-score measures the number of standard errors between the estimate and the value of the null hypothesis. The formula in this case is: (sample proportion – null hypothesis proportion) / standard error.

For paired proportions, McNemar's test applies. The test statistic is:

$z = \frac{n_{12}-n_{21}}{\sqrt{n_{12}+n_{21}}}$

Other than a significance test also a confidence interval can research the differences between dependent proportions. The formula is:

( $\hat{\pi}_2 - \hat{\pi}_1$ ) ± z(se) in which se = $se = \frac{1}{n}\sqrt{(n_{12}+n_{21})-(n_{12}-n_{21})^2/n}$

Fishers exact test is a complex test but it can be performed with software to compare very small samples.

7.7 Which nonparametric methods exist for comparing groups?

Parametric methods assume a certain distribution shape, like the normal distribution. Nonparametric methods don't make assumptions about distribution shape.

Nonparametric methods for comparing groups are mostly used for very small samples or very skewed distributions. Examples are the Wilcoxon test, Mann-Whitney test and nonparametric measure of effect size.

Some nonparametric tests assume that the shape of the population distributions are identical. The model for this is:

H₀ : y₁ and y₂ have the same distribution.

H_a : The distributions of y₁ and y₂ have the same shape, but the distribution of y₁ is skewed more upright than that of y₂.

The Wilcoxon test uses an ordinal scale, it assigns ranking to the observations.

The Mann-Whitney test compares a bunch of observations of a group with a bunch of observations of another group, for instance two sets of weather forecast by different forecasting companies.

The effect size can be applied to nonparametric distributions. The observations from one group are compared to see if they are for instance higher than the observations from another group.

Another option is treating ordinal variables as quantitative variables. This gives a score to each category. This can be an easier method than treating rankings as ordinal variables.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Check more of topic:

Samenvattingen voor psychologie en gedrag

Universiteit Groningen en studieverenigingen

This content is used in:

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check the related and most recent topics and summaries:

Activities abroad, study fields and working areas:

Which kinds of samples and variables are possible? – Chapter 2

What are the main measures and graphs of descriptive statistics? - Chapter 3

What role do probability distributions play in statistical inference? – Chapter 4

How can you make estimates for statistical inference? – Chapter 5

How do you perform significance tests? – Chapter 6

How do you compare two groups in statistics? - Chapter 7

How do you analyze the association between categorical variables? – Chapter 8

How do linear regression and correlation work? – Chapter 9

Which types of multivariate relationships exist? – Chapter 10

What is multiple regression? – Chapter 11

What is ANOVA? – Chapter 12

How does multiple regression with both quantitative and categorical predictors work? – Chapter 13

How do you make a multiple regression model for extreme or strongly correlating data? – Chapter 14

What is logistic regression? – Chapter 15

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: Annemarie JoHo

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

How do you compare two groups in statistics? - Chapter 7

7.1 What are the basic rules for comparing two groups?

7.2 How do you compare two proportions of categorical data?

7.3 How do you compare two means of quantitative data?

7.4 How do you compare the means of dependent samples?

7.5 Which complex methods can be used for comparing means?

7.6 Which complex methods can be used for comparing proportions?

7.7 Which nonparametric methods exist for comparing groups?

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Samenvattingen voor psychologie en gedrag

Universiteit Groningen en studieverenigingen

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Contributions: posts

Add new contribution

Spotlight: topics

Samenvattingen voor psychologie en gedrag

Development Goal 04: Quality Education

Universiteit Groningen en studieverenigingen

Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance