What role do probability distributions play in statistical inference? – Chapter 4
- 4.1 What are the basic rules of probability?
- 4.2 What is the difference in probability distributions for discrete and continuous variables?
- 4.3 How does the normal distribution work exactly?
- 4.4 What is the difference between sample distributions and sampling distributions?
- 4.5 How do you create the sampling distribution for a sample mean?
- 4.6 What is the connection between the population, the sample data and the sampling distribution?
4.1 What are the basic rules of probability?
Randomization is important for collecting data, the idea that the possible observations are known but it's yet unknown which possibility will prevail. What will happen, depends on probability. The probability is the proportion of the number of times that a certain observation is prevalent in a long sequence of similar observations. The fact that the sequence is long, is important, because the longer the sequence, the more accurate the probability. Then the sample proportion becomes more like the population proportion. Probabilities can also be measured in percentages (such as 70%) instead of proportions (such as 0.7). A specific branch within statistics deals with subjective probabilities, called Bayesian statistics. However, most of statistics is about regular probabilities.
A probability is written like P(A), where P is the probability and A is an outcome. If two outcomes are possible and they exclude each other, then the chance that B happens is 1- P(A).
Imagine research about people's favorite colors, whether this is mostly red and blue. Again the assumption is made that the possibilities exclude each other without overlapping. The chance that someone's favorite color is red (A) or blue (B), is P(A of B) = P (A) + P (B).
Next, imagine research that encompasses multiple questions. The research seeks to investigate how many married people have kids. Then you can multiply the chance that someone is married (A) with the chance that someone has kids (B).The formula for this is: P(A and B) = P(A) * P(B if also A). Because there is a connection between A and B, this is called a conditional probability.
Now, imagine researching multiple possibilities that are not connected. The chance that a random person likes to wear sweaters (A) and the chance that another random person likes to wear sweaters (B), is P (A and B) = P (A) x P (B). These are independent probabilities.
4.2 What is the difference in probability distributions for discrete and continuous variables?
A random variable means that the outcome differs for each observation, but mostly this is just referred to as a variable. While a discrete variable has set possible values, a continuous variable can assume any value. Because a probability distribution shows the chances for each value a variable can take, this is different for discrete and continuous variables.
For a discrete variable a probability distribution gives the chances for each possible value. Every probability is a number between 0 and 1. The sum of all probabilities is 1. The probabilities are written P(y), where P is the probability that y has a certain value. In a formula: 0 ≤ P(y) ≤ 1, and ∑all y P(y) = 1.
Because a continuous variable has unlimited possible values, a probability distribution can't show them all. Instead a probability distribution for continuous variables shows intervals of possible values. The probability that a value falls within a certain interval, is between 0 and 1. When an interval in a graph contains 20% of data, then the probability that a value falls within that interval is 0,20.

Just like a population distribution, a probability distribution has population parameters that describe the data. The mean describes the center and the standard deviation the variability. The formula for calculating a mean of the population distribution for a discrete variable is: µ = ∑ y P(y). This parameter is called the expected value of y and in written form it's E(y).
4.3 How does the normal distribution work exactly?
The normal distribution is useful because many variables have a similar distribution and the normal distribution can help to make statistical predictions. The normal distribution is symmetrical, shaped like a bell and it has a mean (µ) and a standard deviation (σ). The empirical rule is applicable to the normal distribution: 68% falls within 1 standard deviation, 95% within 2 standard deviations and 97% within 3 standard deviations. The normal distribution looks like this:

The number of standard deviations is indicated as z. Software such as R, SPSS, Stata and SAS can find probabilities for a normal distribution. In case of a symmetrical curve, the probabilities are cumulative, meaning that z has the same distance to the mean on the left and on the right. The formula for z is: z = (y - µ) / σ.
The z-score is the number of standard deviations that a variable y is distanced from the mean. A positive z-score means that y falls above the mean, a negative score means that it falls below.
Otherwise, when P is known, then y can be found. Software helps to find the z-score for finding probabilities in a distribution. The formula for y is µ + z σ.
A special kind of normal distribution is the standard normal distribution, which consists of z-scores. A variable y can be converted to z by subtracting the mean and then dividing it by the standard deviation. Then a distribution is created where µ = 0 and σ = 1.
A bivariate normal distribution is used for bivariate probabilities. In case of two variables (y and x), there are two means (µy and µx) and two standard deviations (σy and σx). The covariance is the way that y and x vary together:
Covariance (x, y) = E[(x – µx)(y – µy)]
4.4 What is the difference between sample distributions and sampling distributions?
A simulation can tell whether an outcome of a test such as a poll is a good representation of the population. Software can generate random numbers.
When the characteristics of the population are unknown, samples are used. Statistics from samples give information about the expected parameters for the population. A sampling distribution shows the probabilities for sample measures (this is not the same as a sample distribution that shows the outcome of the data). For every statistic there is a sampling distribution, such as for the sample median, sample mean etc. This kind of distribution shows the probabilities that certain outcomes of that statistic may happen.
A sampling distribution serves to estimate how close a statistic lies to its parameter. A sampling distribution for a statistic based on n observations is the relative frequency distribution of that statistic, that in turn is the result of repeated samples of n. A sampling distribution can be formed using repeated samples but generally its form is known already. The sampling distribution allows to find probabilities for the values of a statistic of a sample with n observations.
4.5 How do you create the sampling distribution for a sample mean?
When the sample mean is known, its proximity to the population mean may still be a mystery. It's still unknown whether ȳ = µ. However, the sampling distribution creates indications, for instance a high probability that ȳ is within ten values of µ. In the end, when a lot of samples are drawn, the mean of a sampling distribution equals the mean of the population.
The variability of the sampling distribution of ȳ is described by the standard deviation of ȳ, called the standard error of ȳ. This is written as σȳ. The formula for finding the standard error is:
σȳ =
The standard error indicates how much the mean varies per sample, this says something about how valuable the samples are.
For a random sample of size n, the standard error of ȳ depends on the standard deviation of the population (σ). When n gets bigger, the standard error becomes smaller. This means that a bigger sample represents the population better. The fact that the sample mean and the population mean are different, is called the sampling error.
The standard error and the sampling error are two different things. The sampling error indicates that the sample and the population are different in terms of the mean. The standard error measures how much samples differ from each other in terms of the mean.
No matter how the population distribution is shaped, the sampling distribution of ȳ is always a normal distribution. This is called the Central Limit Theorem. Even if the population distribution has very discrete values, the sampling distribution is a normal distribution. However, when the population is very skewed over a matter, the sample needs to be big for the sampling distribution to have the normal shape. For small samples the Central Limit Theorem can't necessarily be used.
Just like the standard error, the Central Limit Theorem is useful for finding information about the sampling distribution and the sample mean ȳ. Because it has a normal distribution, the Empirical Rule can be applied.
4.6 What is the connection between the population, the sample data and the sampling distribution?
To understand sampling, distinguishing between three distributions is important:
The population distribution describes the entirety from which the sample is drawn. The parameters µ and σ denote the population mean and the population standard deviation.
The sample data distribution portrays the variability of the observations made in the sample. The sample mean ȳ and the sample standard deviation s describe the curve.
The sampling distribution shows the probabilities that a statistic from the sample, such as the sample mean, has certain values. It tells how much samples can differ.
The Central Limit Theorem says that the sampling distribution is shaped like a normal distribution. Information can be deducted just from this shape. The possibility to retrieve information from the shape, is the reason that the normal distribution is very important to statistics.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)
- What are statistical methods? – Chapter 1
- Which kinds of samples and variables are possible? – Chapter 2
- What are the main measures and graphs of descriptive statistics? - Chapter 3
- What role do probability distributions play in statistical inference? – Chapter 4
- How can you make estimates for statistical inference? – Chapter 5
- How do you perform significance tests? – Chapter 6
- How do you compare two groups in statistics? - Chapter 7
- How do you analyze the association between categorical variables? – Chapter 8
- How do linear regression and correlation work? – Chapter 9
- Which types of multivariate relationships exist? – Chapter 10
- What is multiple regression? – Chapter 11
- What is ANOVA? – Chapter 12
- How does multiple regression with both quantitative and categorical predictors work? – Chapter 13
- How do you make a multiple regression model for extreme or strongly correlating data? – Chapter 14
- What is logistic regression? – Chapter 15
Selected contributions for Data: distributions, connections and gatherings
- Which kinds of samples and variables are possible? – Chapter 2
- What are the main measures and graphs of descriptive statistics? - Chapter 3
- What role do probability distributions play in statistical inference? – Chapter 4
- Call to action: Do you have statistical knowledge and skills and do you enjoy helping others while expanding your international network?
- Understanding data: distributions, connections and gatherings
Work for WorldSupporter?
Volunteering: WorldSupporter moderators and Summary Supporters
Volunteering: Share your summaries or study notes
Student jobs: Part-time work as study assistant in Leiden

Contributions: posts
Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)
Summary of Statistical methods for the social sciences by Agresti, 5th edition, 2018. Summary in English.
Selected contributions for Data: distributions, connections and gatherings
Selected contributions of other WorldSupporters on the topic of Data: distributions, connections and gatherings
Online access to all summaries, study notes en practice exams
- Check out: Register with JoHo WorldSupporter: starting page (EN)
- Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)
How and why use WorldSupporter.org for your summaries and study assistance?
- For free use of many of the summaries and study aids provided or collected by your fellow students.
- For free use of many of the lecture and study group notes, exam questions and practice questions.
- For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
- For compiling your own materials and contributions with relevant study help
- For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.
Using and finding summaries, notes and practice exams on JoHo WorldSupporter
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
- Use the summaries home pages for your study or field of study
- Use the check and search pages for summaries and study aids by field of study, subject or faculty
- Use and follow your (study) organization
- by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
- this option is only available through partner organizations
- Check or follow authors or other WorldSupporters
- Use the menu above each page to go to the main theme pages for summaries
- Theme pages can be found for international studies as well as Dutch studies
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
- Check out: Why and how to add a WorldSupporter contributions
- JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
- Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form
Quicklinks to fields of study for summaries and study assistance
Main summaries home pages:
- Business organization and economics - Communication and marketing -International relations and international organizations - IT, logistics and technology - Law and administration - Leisure, sports and tourism - Medicine and healthcare - Pedagogy and educational science - Psychology and behavioral sciences - Society, culture and arts - Statistics and research
- Summaries: the best textbooks summarized per field of study
- Summaries: the best scientific articles summarized per field of study
- Summaries: the best definitions, descriptions and lists of terms per field of study
- Exams: home page for exams, exam tips and study tips
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourism & Sports
Main study fields NL:
- Studies: Bedrijfskunde en economie, communicatie en marketing, geneeskunde en gezondheidszorg, internationale studies en betrekkingen, IT, Logistiek en technologie, maatschappij, cultuur en sociale studies, pedagogiek en onderwijskunde, rechten en bestuurskunde, statistiek, onderzoeksmethoden en SPSS
- Studie instellingen: Maatschappij: ISW in Utrecht - Pedagogiek: Groningen, Leiden, Utrecht - Psychologie: Amsterdam, Leiden, Nijmegen, Twente, Utrecht - Recht: Arresten en jurisprudentie, Groningen, Leiden
Select any filter and click on Apply to see results










Add new contribution