Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 9 summary

STEPS FOR PERFORMING A SIGNIFICANCE TESTA hypothesis is a statement about the population. A significance test is a method for using data to summarize the evidence about a hypothesis. The null hypothesis (H0) is a statement that the parameter takes a particular value (e.g: probability of getting a baby girl: p = 0.482). The alternative hypothesis (Ha) states that the parameter falls in some alternative range of values. A significance test has five steps:AssumptionsEach significance test has certain assumptions or has certain condition under which it applies (e.g: an assumption is the assumption that random sampling has been used).HypothesesEach significance test has two hypotheses about a population parameter. The null hypothesis and the alternative hypothesis.Test statisticThe parameter to which the hypotheses refer has a point estimate. A test statistic describes how far that point estimate falls from the parameter value given in the null hypothesis. This is usually measured in number of standard errors between the point estimate and the parameter.P-valueA probability summary of the evidence against the null hypothesis is used to interpret a test statistic. The P-value is the probability that the test statistic equals the observed value or a value even more extreme. It is calculated by presuming that the null hypothesis is true.ConclusionThe conclusion of the significance test reports the P-value and interprets what is says about the question that motivated the test.SIGNIFICANCE TESTS ABOUT PROPORTIONSThe steps for the significance test are the same for proportions. The biggest assumption made here is that the sample size is large enough that the sampling distribution is approximately normal. The hypotheses are the following for significance tests about proportions: and or This is called a one-sided alternative hypothesis, because it has values falling only on one side of the null hypothesis value. A two-sided alternative...

Access options

How do you get full online access and services on JoHo WorldSupporter.org?

1 - Go to www JoHo.org, and join JoHo WorldSupporter by choosing a membership + online access

2 - Return to WorldSupporter.org and create an account with the same email address

3 - State your JoHo WorldSupporter Membership during the creation of your account, and you can start using the services

You have online access to all free + all exclusive summaries and study notes on WorldSupporter.org and JoHo.org
You can use all services on JoHo WorldSupporter.org (EN/NL)
You can make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (Dutch service)

Already an account?

If you already have a WorldSupporter account than you can change your account status from 'I am not a JoHo WorldSupporter Member' into 'I am a JoHo WorldSupporter Member with full online access
Please note: here too you must have used the same email address.

Are you having trouble logging in or are you having problems logging in?

Read first the answers to the most frequently asked questions

Toegangsopties (NL)

Hoe krijg je volledige toegang en online services op JoHo WorldSupporter.org?

1 - Ga naar www JoHo.org, en sluit je aan bij JoHo WorldSupporter door een membership met online toegang te kiezen
2 - Ga terug naar WorldSupporter.org, en maak een account aan met hetzelfde e-mailadres
3 - Geef bij het account aanmaken je JoHo WorldSupporter membership aan, en je kunt je services direct gebruiken

Je hebt nu online toegang tot alle gratis en alle exclusieve samenvattingen en studiehulp op WorldSupporter.org en JoHo.org
Je kunt gebruik maken van alle diensten op JoHo WorldSupporter.org (EN/NL)
Op JoHo.org kun je gebruik maken van de tools voor werken in het buitenland, verre reizen, vrijwilligerswerk, stages en studeren in het buitenland

Heb je al een WorldSupporter account?

Wanneer je al eerder een WorldSupporter account hebt aangemaakt dan kan je, nadat je bent aangesloten bij JoHo via je 'membership + online access ook je status op WorldSupporter.org aanpassen
Je kunt je status aanpassen van 'I am not a JoHo WorldSupporter Member' naar 'I am a JoHo WorldSupporter Member with 'full online access'.
Let op: ook hier moet je dan wel hetzelfde email adres gebruikt hebben

Kom je er niet helemaal uit of heb je problemen met inloggen?

Lees dan eerst de antwoorden op de meest gestelde vragen

Join JoHo WorldSupporter!

What can you choose from?

JoHo WorldSupporter membership (= from €5 per calendar year):

To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
To use the basic features of JoHo WorldSupporter.org

JoHo WorldSupporter membership + online access (= from €10 per calendar year):

To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
To use full services on JoHo WorldSupporter.org (EN/NL)
For access to the online book summaries and study notes on JoHo.org and Worldsupporter.org
To make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (NL service)

Register, become a JoHo member, and get your services

Sluit je aan bij JoHo WorldSupporter! (NL)

Waar kan je uit kiezen?

JoHo membership zonder extra services (donateurschap) = €5 per kalenderjaar

Voor steun aan de JoHo WorldSupporter en Smokey projecten en een bijdrage aan alle activiteiten op het gebied van internationale samenwerking en talentontwikkeling
Voor gebruik van de basisfuncties van JoHo WorldSupporter.org
Voor het gebruik van de kortingen en voordelen bij partners
Voor gebruik van de voordelen bij verzekeringen en reisverzekeringen zonder assurantiebelasting

JoHo membership met extra services (abonnee services): Online toegang Only= €10 per kalenderjaar

Voor volledige online toegang en gebruik van alle online boeksamenvattingen en studietools op WorldSupporter.org en JoHo.org
voor online toegang tot de tools en services voor werk in het buitenland, lange reizen, vrijwilligerswerk, stages en studie in het buitenland
voor online toegang tot de tools en services voor emigratie of lang verblijf in het buitenland
voor online toegang tot de tools en services voor competentieverbetering en kwaliteitenonderzoek
Voor extra steun aan JoHo, WorldSupporter en Smokey projecten

Meld je aan, wordt donateur en maak gebruik van de services

Check page access:

JoHo members

Join WorldSupporter!

Join with a free account for more service, or become a member for full access and support of WordSupporter

This content is related to:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

This bundle contains a full summary for the book "Statistics, the art and science of learning from data by A. Agresti (third edition". It contains the following chapters: 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15.Read more

1466 reads

Research Methods & Statistics – Interim exam 3 (UNIVERSITY OF AMSTERDAM)

This bundle contains a summary for the third interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters:...Read more

1620 reads

Check more or recent content:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 1 summary

USING DATA TO ANSWER STATISTICAL QUESTIONS
The information we gather with experiments and surveys is collectively called data. Statistics is the art and science of learning from data. Statistical problem solving consists of four things:

Formulate a statistical question
Collect data
Analyse data
Interpret results

The three main components of statistics for answering a statistical question are:

Design
Stating the goal and/or statistical question of interest and planning how to obtain data that will address them. (e.g: how do you conduct an experiment to determine the effects of ‘X’)
Description
Summarizing and analysing the data that are obtained (e.g: summarizing people’s tv-habits in ‘hours of tv watched per day’)
Inference
Making decisions and predictions based on the data for answering the statistical question. (predicting the outcome of an election, based on the description of the data)

Probability is a framework for quantifying how likely various possible outcomes are.

SAMPLE VERSUS POPULATION
The entities that are measured in a study are called the subjects. This usually means people, but it can also be schools, countries or days. The population is the set of all the subjects of interest. In practice, we usually have data for only some of the subjects who belong to that population. These subjects are called a sample.

Descriptive statistics refers to methods for summarizing the collected data. The summaries usually consist of graphs and numbers such as averages and percentages. Inferential statistics are used when data are available from a sample only, but we want to make a decision or prediction about the entire population. Inferential statistics refers to methods of making decisions or predictions about a population, based on data obtained from a sample of that population.

A parameter is a numerical summary of the population. A statistic is a numerical summary of a sample taken from the population. The true parameter values are almost always unknown, thus we use sample statistics to estimate the parameter values.

A sample is random when everyone in the population has the same chance of being included in the sample. Random sampling allows us to make powerful inferences about populations. The margin of error is a measure of the expected variability from one random sample to the next random sample.

The formula for calculating the approximate margin of error is: . In this case, ‘n’ is the number of subjects.

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 2 summary

DIFFERENT TYPES OF DATA
A variable is any characteristic observed in a study. The data values that we observe for a variable are called observations. A variable can be categorical and quantitative.

Categorical variables are variables that belong to a distinct set of categories. A categorical variable can be numerical, because some variables do not vary in quantity. (e.g: religion, favourite sport, bank account, area codes)
Quantitative variables are variables that have numerical values and represent different magnitudes. (e.g: weight, height, hours spent watching TV every day)

Key features to describe quantitative variables are the centre and the variability (spread) of the data (e.g: average amount of hours spent watching TV every day). Key feature to describe categorical variables is the relative number of observations in various categories. (e.g: the percentage of days in a year that it was sunny)

Quantitative variables can be discrete and continuous. A quantitative variable is discrete if its possible values form a set of separate numbers, such as 0, 1, 2, 3 (e.g: the number of pets in a household). A quantitative variable is continuous if its possible values form an interval, such as 0.16, 0,13, 2,32 (e.g: weight: 68,3 kg).

The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. The modal category is the category with the largest frequency.

A frequency table is a listing of possible values for a variable, together with the number of observations for each value.

Category	A	B	C
Frequency	17	23	9
Proportion	0.347	0.469	0.184
Percentage	34.7%	46.9%

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 3 summary

THE ASSOCIATION BETWEEN TWO CATEGORICAL VARIABLES
When analysing data the first step is to distinguish between the response variable and the explanatory variable. The response variable is the outcome variable on which comparisons are made. If the explanatory variable is categorical, it defines the groups to be compared with respect to values for the response variable. If the explanatory variable is quantitative, it defines the change in different numerical values to be compared with respect to values for the response variable. The explanatory variable should explain the response variable (e.g: survival status is a response variable and smoking status is the explanatory variable).

An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.

A contingency table is a display for two categorical variables. Conditional proportions are proportions which formation is conditional on ‘x’. A conditional proportion should be conditional to something. A conditional proportion is also a percentage. The proportion of the totals (e.g: percentage of total amount of ‘no’) is called a marginal proportion.

There is probably an association between two variables if there is a clear explanatory/response relationship, that dictates which way we compute the conditional proportions. Conditional proportions are useful in determining if there’s an association. A variable can be independent from another variable.

THE ASSOCIATION BETWEEN TWO QUANTITATIVE VARIABLES
We examine a scatterplot to study association. There is a difference between a positive association and a negative association. If there is a positive association, x goes up as y goes up. If there is a negative association, x goes up as y goes down.

Correlation describes the strength of the linear association. Correlation (r) summarizes th direction of the association between two quantitative variables and the strength of its linear trend. It can take a value between -1 and 1. A positive value for r indicates a positive association and a negative value for r indicates a negative association. The closer r is to 1, the closer the data points fall to a straight line and the stronger the linear association is. The closer r is to 0, the weaker the linear association is.

The properties of the correlation:

The correlation always falls between -1 and +1.
A positive correlation indicates a positive association and a negative correlation indicates a negative association.
The value of the correlation does not depend on the variables’ unit (e.g: euros or dollars)
Two variables have the same correlation no matter which is treated as the response variable and which is treated at the explanatory variable.

The correlation r can be calculated as following:

N is the number of points. and ȳ are means and

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 5 summary

HOW PROBABILITY QUANTIFIES RANDOMNESS
Probability is the way we quantify uncertainness. It measures the chances of the possible outcomes for random phenomena. A random phenomenon is an everyday occurrence for which the outcome is uncertain. With random phenomena, the proportion of times that something happens is highly random and variable in the short run, but very predictable in the long run. The law of large numbers states that if the number of trials increases, the proportion of occurrences of any outcome approaches a given number. The probability of a particular outcome is the proportion of times that the outcome would occur in a long run of observations.

Different trials of a random phenomena are independent if the outcome of any one trial is not affected by the outcome of any other trial (e.g: if you have three children who are boys, the chance of the next child being a girl is not higher, but still ½).

In the subjective definition of probability, the probability is not based on objective data, but rather subjective information. The probability of an outcome is defined to be a personal probability. This is called Bayesian statistics.

FINDING PROBABILITIES
The sample space is the set of all possible outcomes (e.g: with being pregnant, the sample space is: {boy, girl}). An event is a subset of the sample space. An event corresponds to a particular outcome or a group of possible outcomes (e.g: a particular outcome or a group of possible outcomes). The probability of an event has the following formula:

For example, if you want to know the probability of the event throwing 6 with a fair dice, you calculate it like this:

Number of outcomes in event A: 1 (there is only one possibility to throw 6)
Number of outcomes in the sample space: 6 (you can throw between 1 and 6)
P(A) = 1/6

The rest of the sample space for event A is called the complement of A. The complement of an event consists of all outcomes in the sample space that are not in the event.

Events that do not share any outcomes in common are disjoint (e.g: two events, A and B, are disjoint if they do not have any common outcomes). The chance that in the case of two events, A and B, both occur is called the intersection. The event that the outcome is A or B is the union of A and B.

There are three general rules for calculating the probabilities:

Complement rule
Addition rule
There are two parts of the addition rule. For the union of two events:

If the events are disjoint:

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 6 summary

SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES
All possible outcomes and probabilities are summarized in a probability distribution. There is a normal and a binomial distribution. A random variable is a numerical measurement of the outcome of a random phenomenon. The probability distribution of a discrete random variable assigns a probability to each possible value. Numerical summaries of the population are called parameters and a population distribution is a type of probability distribution, one that applies for selecting a subject at random from a population.

The formula for the mean of a probability distribution for a discrete random variable is:

μ= ΣxP(x)

It is also called a weighted average, because some outcomes are likelier to occur than others, so a regular mean would be insufficient here. The mean of a probability distribution of random variable X is also called the expected value of X. The standard deviation of a probability distribution measures the variability from the mean. It describes how far values of the random variable fall, on the average, from the expected value of the distribution. A continuous variable is measured in a discrete manner, because of rounding. A probability distribution for a continuous random variable is used to approximate the probability distribution for the possible rounded values.

PROBABILITIES FOR BELL-SHAPED DISTRIBUTIONS

The z-score for a value x of a random variable is the number of standard deviations that x falls from the mean. It is calculated as:

The standard normal distribution is the normal distribution with mean and standard deviation . It is the distribution of normal z-scores.

PROBABILITIES WHEN EACH OBSERVATION HAS TWO POSSIBLE OUTCOMES
An observation is binary if it has one of two possible outcomes (e.g: accept or decline, yes or no). A random variable X that counts the number of observations of a particular type has a probability distribution called the binomial distribution. There are a few conditions for a binomial distribution:

Two possible outcomes
Each trial has two possible outcomes.
Same probability of success
Each trial has the same probability of success
Trials are independent

The formula for the binomial probabilities for any n is:

The binomial distribution is valid if the sample size is less than 10% of the population. There are a couple of formulas for the binomial distribution:

and

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 7 summary

HOW SAMPLE PROPORTIONS VARY AROUND THE POPULATION PROPORTION
The sample distribution of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take. The population distribution from which you take the sample. Values of its parameters are fixed, but usually unknown. Data distribution is the distribution of the sample data. It is also called sample proportion. Sampling distribution is the distribution of a sample statistic such as a sample proportion. Sampling distributions describe the variability that occurs from sample to sample.

For a random sample size n from a population with proportion p of outcomes in a particular category, the sampling distribution of the sample proportion in that category has:

and

For a large sample size n, the binomial distribution has a normal distribution. The central limit theorem states that the sampling distribution of the sample mean x̄ often has approximately a normal distribution. This result applies no matter what the shape of the population distribution from which the samples are taken. The standard deviation of the sampling distribution has the following formula:

The larger the sample, the closer the sample mean tends to fall to the population mean.

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 8 summary

POINT AND INTERVAL ESTIMATES OF POPULATION PARAMETERS
A point estimate is a single number that is our best guess for the parameter (e.g: 25% of all Dutch people are above 1,80m). An interval estimate is an interval of numbers within which the parameter value is believed to fall (e.g: between 20% and 30% of the Dutch people are above 1,80m). The margin of error gives the lower border and the upper border of the margin.

A good estimator of a parameter has two properties:

Unbiased
A good estimator has a sampling distribution that is centred at the parameter. A mean from a random sample should fall around the population parameter and this is especially the case with multiple samples and thus a sampling distribution.
Small standard deviation
A good estimator has a small standard deviation compared to other estimators. The sample mean is preferred over the sample median, even in a normal distribution, because the sample mean has a smaller standard deviation.

An interval estimate is designed to contain the parameter with some chosen probability, such as 0.95. Confidence intervals are interval estimates that contain the parameter with a certain degree of confidence. A confidence interval is an interval containing the most believable values for a parameter. The probability that this method produces an interval that contains the parameter is called the confidence level. A sampling distribution of a sample proportion gives the possible values for the sample proportion and their probabilities and is a normal distribution if np is larger than 15 and n(1-p) is larger than 15. The margin of error measures how accurate the point estimate is likely to be in estimating a parameter.

CONSTRUCTING A CONFIDENCE INTERVAL TO ESTIMATE A POPULATION PROPORTION
The point estimate of the population proportion is the sample proportion. The standard error is the estimated standard deviation of a sampling distribution. The formula for the standard error is:

The greater the confidence level, the greater the interval. The margin of error decreases with bigger samples, because the standard error decreases with bigger samples. The larger the sample, the narrower the interval. If using a 95% confidence interval over time, then 95% of the intervals would give correct results, containing the population proportion.

CONSTRUCTING A CONDIFENCE INTERVAL TO ESTIMATE A POPULATION MEAN
The standard error for the population mean has the following formula:

The t-score is like a z-score, but a bit larger, and comes from a bell-shaped distribution that has slightly thicker tails than a normal distribution. The distribution that uses the t-score and the standard error, rather than the z-score and the standard deviation is called the t-distribution. The standard deviation of the t-distribution is a bit larger than 1, with the precise value depending on what is called the degrees of freedom. The t-score has

Access:

Public

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 9 summary

STEPS FOR PERFORMING A SIGNIFICANCE TEST
A hypothesis is a statement about the population. A significance test is a method for using data to summarize the evidence about a hypothesis. The null hypothesis (H0) is a statement that the parameter takes a particular value (e.g: probability of getting a baby girl: p = 0.482). The alternative hypothesis (Ha) states that the parameter falls in some alternative range of values. A significance test has five steps:

Assumptions
Each significance test has certain assumptions or has certain condition under which it applies (e.g: an assumption is the assumption that random sampling has been used).
Hypotheses
Each significance test has two hypotheses about a population parameter. The null hypothesis and the alternative hypothesis.
Test statistic
The parameter to which the hypotheses refer has a point estimate. A test statistic describes how far that point estimate falls from the parameter value given in the null hypothesis. This is usually measured in number of standard errors between the point estimate and the parameter.
P-value
A probability summary of the evidence against the null hypothesis is used to interpret a test statistic. The P-value is the probability that the test statistic equals the observed value or a value even more extreme. It is calculated by presuming that the null hypothesis is true.
Conclusion
The conclusion of the significance test reports the P-value and interprets what is says about the question that motivated the test.

SIGNIFICANCE TESTS ABOUT PROPORTIONS
The steps for the significance test are the same for proportions. The biggest assumption made here is that the sample size is large enough that the sampling distribution is approximately normal. The hypotheses are the following for significance tests about proportions:

and or

This is called a one-sided alternative hypothesis, because it has values falling only on one side of the null hypothesis value. A two-sided alternative hypothesis has the form of:

The test statistic of a significance test about proportions is:

The P-value of a test statistic of a significance test about proportions is the left- or right-tail probability of a test statistic value even more extreme than observed. Smaller P-values indicate stronger evidence against the null hypothesis, because the data would be more unusual if the null hypothesis were true. In a two-sides test, the P-value is the probability of a single tail doubled. The significance level is a number such that we reject H0 if the P-value is less than or equal to that number. The most common significance level is 0.05. If the data provide evidence to reject H0 and accept Ha, the data is called statistically significant. If Ha is rejected, this does not mean that

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 10 summary

CATEGORICAL RESPONSE: COMPARING TWO PROPORTIONS:
Bivariate methods is the general category of statistical methods used when we have two variables. The outcome variable on which comparisons are made is called the response variable. The binary variable that specifies the groups is the explanatory variable. In an independent sample, observations in one sample are independent from observations in another sample. If two samples have the same subjects, they are dependent. If each subject in one sample is matched with a subject in another sample there are matched pairs and the data is dependent as well.

The formula for the standard error for comparing two proportions is:

A 95% confidence interval for the difference between two population proportions has the following formula:

The proportion (p̂) is called a pooled estimate, since it pools the total number of successes and total number of observations from two samples. This uses the presumption p1=p2. The test statistic uses the following formula:

The standard error for the test statistic uses the following formula:

QUANTITATIVE RESPONSE: COMPARING TWO MEANS:
The standard error for comparing two means has the following formula:

A 95% confidence interval for the difference between two population means has the following formula:

The confidence interval for the difference between two population means uses the t-distribution and not the z-distribution. Interpreting a confidence interval for the difference of means uses the following criteria:

Check whether or not 0 falls in the interval
If it does, it could be that mean 1 is mean 2.
Positive confidence interval suggests that mean 1 – mean 2 is positive
If the confidence interval only contains positive numbers, this suggests that mean 1 – mean 2 is positive. This suggests that mean 1 is larger than mean 2.
Negative confidence interval suggests that mean 1 – mean 2 is negative
If the confidence interval only contains negative numbers, this suggests that mean 1 - mean 2 is negative. This suggests that mean 1 is smaller than mean 2.
Group order is arbitrary
It is arbitrary whether one group is group one or the other.

The test statistic of a significance test comparing two population means uses the following formula:

It uses minus zero because the null hypothesis is that there is no difference between the groups and is thus zero.

OTHER WAYS OF COMPARING MEANS AND COMPARING PROPORTIONS
If it is reasonable to expect that the variability as

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 11 summary

INDEPENDENCE AND DEPENDENCE (ASSOCIATION)
Conditional percentages refer to a sample data distribution, conditional on a category. They form the conditional distribution. If the probabilities for two different categorical variables are the same in the same category, then these variables are independent. If the probabilities for two different categorical variables differ, then these variables are dependent. Dependence refers to the population, so if there is barely any difference between two categorical variables in a sample, it could be independent, even though they differ.

TESTING CATEGORICAL VARIABLES FOR INDEPENDENCE
The expected cell count is the mean of the distribution for the count in any particular cell. The formula for the expected cell count is the following:

The chi-squared statistic summarizes how far the observed cell counts in a contingency table fall from the expected cell counts for a null hypothesis. It is the test statistic for the test of independence. The formula for the chi-squared statistic is:

The sampling distribution using the chi-squared statistic is called the chi-squared probability distribution. The chi-squared probability distribution has several properties:

Always positive
Shape depends on degrees of freedom
Mean equals degrees of freedom
As degrees of freedom increases the distribution becomes more bell shaped
Large chi-square is evidence against independence

The degrees of freedom in a table with r rows and c columns can be calculated as following:

If a response variable is identified and the population conditional distributions are identical, they are said to be homogeneous. The chi-squared test is then referred to as a test of homogeneity. The degrees of freedom value in a chi-squared test indicates how many parameters are needed to determine all the comparisons for describing the contingency table. The chi-squared test can test for independence, but it cannot provide information about the strength and the direction of the associations and provide information about the practical significance, only about the statistical significance. When testing particular proportion values for a categorical variable, the chi-squared statistic is referred to as a goodness-of-fit statistic. The statistic summarizes how well the hypothesized values predict what happens with the observed data.

DETERMINING THE STRENGTH OF THE ASSOCIATION
A measure of association is a statistic or a parameter that summarizes the strength of the dependence between two variables. The association can be measured by looking at the difference of two associations. The formula for the difference of the two proportions is the following:

The ratio of two proportions is also a measure of association. This is also called the relative risk. The relative risk uses the following formula:

The relative risk has several properties:

The relative risk can equal any non-negative number
When p1=p2, the variables

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 12 summary

MODEL HOW TWO VARIABLES ARE RELATED
A regression line is a straight line that predicts the value of a response variable ‘y’ from the value of an explanatory variable ‘x’. The correlation is a summary measure of association. The regression line uses the following formula:

The data is plotted before a regression line is made, because it can be strongly influenced by outliers. The regression equation is often called a prediction equation. The difference between y - ŷ, between an observed outcome y and its predicted value ŷ is the prediction error, called the residual. The average of the residuals is zero. The regression line has a smaller sum of squared residuals than any other line. It is called the least squares line. The population regression equation has the following formula:

This formula is a model. A model is a simple approximation for how variables relate in a population. The probability distributions of y values at a fixed value of x is a conditional distribution (e.g: the means of annual income for people with 12 years of education).

DESCRIBE STRENGTH OF ASSOCIATION
Correlation does not differentiate between response and explanatory variables. The formula for the slope uses the correlation and can be calculated as following:

Using this formula, the y-intercept can be calculated:

The slope can’t be used to determine the strength of the association, because it determines on the units of measurement. The correlation is the standardized version of the slope. The formula for the correlation is the following:

A property of the correlation is that at any particular x value, the predicated value of y is relatively closer to its mean than x is to its mean. If a particular ‘x’ value falls 2.0 standard deviations from the mean with a correlation of 0.80, then the predicted ‘y’ is ‘r’ times that many standard deviations from its mean, so the predicted ‘y’ would be 0.80 times 2.0 standard deviations from the mean. The predicted ‘y’ is relatively closer to its mean than ‘x’ is to its mean. This is regression toward the mean. If the first observation is extreme, the second observation will be more toward the mean and will be less extreme.

Predicting ‘y’ using ‘x’ with the regression equation is called the residual sum of squares and this uses the following formula:

The measure r squared is interpreted as proportional reduction in error (e.g: if r squared = 0.40, the error using y-hat to predict y is 40% smaller than the error using y-bar to predict y). The formula for r squared

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 14 summary

ONE-WAY ANOVE: COMPARING SEVERAL MEANS
The inferential method for comparing means of several groups is called analysis of variance, also called ANOVA. Categorical explanatory variables in multiple regression and in ANOVA are referred to as factors, also known as independent variables. An ANOVA with only one independent variable is called a one-way ANOVA.

Evidence against the null hypothesis in an ANOVA test is stronger when the variability within each sample is smaller or when the variability between groups is larger. The formula for the F (ANOVA) test is:

When the null hypothesis is true, the mean of the F-distribution is approximately 1. If the null hypothesis is wrong, then F>1. This also increases if the sample size increases. The larger the F-statistic, the smaller the P-value. The F-distribution has two degrees of freedom values:

and

The ANOVA test has five steps:

Assumptions
A quantitative response variable for more than two groups. Independent random samples. Normal population distribution with equal standard deviation.
Hypotheses
Test statistic
y
P-value
This is the right-tail probability of the observed F-value.
Conclusion
The null hypothesis is normally rejected if the P-value is smaller than 0.05.

If the sample sizes are equal, the within-groups estimate of the variance is the mean of the g sample variances for the g groups. It uses the following formula:

If the sample sizes are equal, the between-groups estimate of the variance uses the following formula:

The ANOVA F-test is robust to violations if the sample size is large enough. If the population sample sizes are not equal, the F test works quite well as long as the largest group standard deviation is no more than about twice the smallest group standard deviation. Disadvantages of the F-test are that it tells us whether groups are different, but it does not tell us which groups are different.

ESTIMATING DIFFERENCES IN GROUPS FOR A SINGLE FACTOR
The F-test only tells us if groups are different, not how different and which groups are different. Confident intervals can. A confidence interval for comparing means uses the following formula:

The degrees of freedom for the confidence interval is:

If the confidence interval does not contain 0, we can infer that the population means are different. Methods that control the probability that all confidence intervals will contain the true differences in means are called multiple comparison methods. Multiple comparison methods compare pairs of means with a confidence level that applies simultaneously to the

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 15 summary

COMPARE TWO GROUPS BY RANKING
Nonparametric statistical methods are inferential methods that do not assume a particular form of distribution (e.g: the assumption of a normal distribution) for the population distribution. The Wilcoxon test is the best known nonparametric method. Nonparametric methods are useful when the data are ranked and when the assumption of normality is inappropriate.

The Wilcoxon test sets up a distribution using the probability of each difference of the mean rank. This test has five steps:

Assumptions
Independent random samples from two groups.
Hypotheses
Test statistic
This is the difference between the sample mean ranks for the two groups.
P-value
This is a one-tail or two-tail probability, depending on the alternative hypothesis.
Conclusion
The null hypothesis is either rejected in favour of the alternative hypothesis or not.

The sum of the ranks can also be used, instead of the mean of the ranks. When conducting the Wilcoxon test, a z-test can also be conducted if the sample is large enough. This z-test has the following formula:

A Wilcoxon test can also be conducted by converting quantitative observations to ranks. The Wilcoxon test is not affected by outliers (e.g: an extreme outlier gets the lowest/highest rank, no matter if it’s a bit higher or lower than the number before that). The difference between the population medians can also be used if the distribution is highly skewed, but this requires the extra assumption that the population distribution of the two groups have the same shape. The point estimate of the difference between two medians equals the median of the differences between the two groups. A sample proportion can also be used, by checking what the proportion is of observations in group one that’s better than group two. If there is a proportion of 0.50, then there is no effect. The closer the proportion gets to 0 or 1, the greater the difference between the two groups.

NONPARAMETRIC METHODS FOR SEVERAL GROUPS AND FOR MATCHED PAIRS
The test for comparing mean ranks of more than two groups is called the Kruskal-Wallis test. This test has five steps:

Assumptions
Independent random samples.
Hypotheses
Test statistic
The test statistic is based on the between-groups variability in the sample mean ranks. The test statistic uses the following formula:
The test statistic has an approximate chi-squared distribution with g-1 degrees of freedom.
P-value
The right-tail probability above observed test statistic value from chi-squared distribution.
Conclusion
The null hypothesis is either rejected in favour of the alternative hypothesis or not.

It is

Access:

JoHo members

Research Methods & Statistics – Interim exam 3 (UNIVERSITY OF AMSTERDAM)

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 9 summary

Assumptions
Each significance test has certain assumptions or has certain condition under which it applies (e.g: an assumption is the assumption that random sampling has been used).
Hypotheses
Each significance test has two hypotheses about a population parameter. The null hypothesis and the alternative hypothesis.
Test statistic
The parameter to which the hypotheses refer has a point estimate. A test statistic describes how far that point estimate falls from the parameter value given in the null hypothesis. This is usually measured in number of standard errors between the point estimate and the parameter.
P-value
A probability summary of the evidence against the null hypothesis is used to interpret a test statistic. The P-value is the probability that the test statistic equals the observed value or a value even more extreme. It is calculated by presuming that the null hypothesis is true.
Conclusion
The conclusion of the significance test reports the P-value and interprets what is says about the question that motivated the test.

and or

This is called a one-sided alternative hypothesis, because it has values falling only on one side of the null hypothesis value. A two-sided alternative hypothesis has the form of:

The test statistic of a significance test about proportions is:

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 10 summary

The formula for the standard error for comparing two proportions is:

A 95% confidence interval for the difference between two population proportions has the following formula:

The standard error for the test statistic uses the following formula:

QUANTITATIVE RESPONSE: COMPARING TWO MEANS:
The standard error for comparing two means has the following formula:

A 95% confidence interval for the difference between two population means has the following formula:

Check whether or not 0 falls in the interval
If it does, it could be that mean 1 is mean 2.
Positive confidence interval suggests that mean 1 – mean 2 is positive
If the confidence interval only contains positive numbers, this suggests that mean 1 – mean 2 is positive. This suggests that mean 1 is larger than mean 2.
Negative confidence interval suggests that mean 1 – mean 2 is negative
If the confidence interval only contains negative numbers, this suggests that mean 1 - mean 2 is negative. This suggests that mean 1 is smaller than mean 2.
Group order is arbitrary
It is arbitrary whether one group is group one or the other.

The test statistic of a significance test comparing two population means uses the following formula:

It uses minus zero because the null hypothesis is that there is no difference between the groups and is thus zero.

OTHER WAYS OF COMPARING MEANS AND COMPARING PROPORTIONS
If it is reasonable to expect that the variability as

Access:

JoHo members

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 11 summary

The sampling distribution using the chi-squared statistic is called the chi-squared probability distribution. The chi-squared probability distribution has several properties:

Always positive
Shape depends on degrees of freedom
Mean equals degrees of freedom
As degrees of freedom increases the distribution becomes more bell shaped
Large chi-square is evidence against independence

The degrees of freedom in a table with r rows and c columns can be calculated as following:

The ratio of two proportions is also a measure of association. This is also called the relative risk. The relative risk uses the following formula:

The relative risk has several properties:

The relative risk can equal any non-negative number
When p1=p2, the variables

Access:

JoHo members

Research methods in psychology by B. Morling (third edition) – Chapter 4 summary

There are two historical examples of studies that violated several ethical criteria.

Tuskegee Syphilis Study
This experiment involves black men diagnosed with syphilis, who were lied to, not told that the experiment was about syphilis and intentionally not treated. Participants in this study were not treated respectfully, they were harmed and the researcher targeted a disadvantaged social group in this study.
Milgram Obedience Studies
This experiment shows that ethical violations are often much more nuanced. Participants in this experiment were debriefed after the experiment. It also shows that balancing the potential risks to participants and the value of the knowledge gained is not an easy decision.

The Belmont Report outlines three main principles for guiding ethical decision making:

Principle of respect for persons
This includes two provisions. The participants should be treated as autonomous agents. Each person is entitled to the precaution of informed consent. People with less autonomy (e.g: children, mentally disabilities) should be protected. Coercion is an implicit or an explicit suggestion that those who do not participate will suffer a negative consequence.
The principle of beneficence
Researchers must take precautions to protect the participants of harm and to ensure their well-being. Valuable knowledge must be gained while inflicting as less as possible harm. To prevent harm by collecting personal data, the study can be conducted as an anonymous study. In a confidential study, researchers collect some identifying information, but prevent it from being disclosed.
The principle of justice
This calls for a fair balance between the kinds of people who participate in a study and the kinds of people who benefit from it.

The APA outlines five general principles for guiding individual aspects of ethical behaviour. Three of the give general principles are the same principles as in the Belmont Report. The other two are:

Fidelity and responsibility
Establish relationships of trust. Accept the responsibility for professional behaviour (e.g: a psychologist not treating a student or a professor not dating a student).
Integrity
Strive to be accurate, truthful and honest (e.g: professors are obligated to teach accurately).

The APA lists ten specific ethical standards. These standards are similar to enforceable rules or laws.

Institutional review boards
An institutional review board is a committee responsible for interpreting ethical principles and ensuring that research using human participants is conducted ethically.

Standard	Definition
Institutional review board	This is a committee responsible for interpreting ethical principles and ensuring that research using human participants is

Access:

JoHo members

Research methods in psychology by B. Morling (third edition) – Chapter 11 summary

THREATS TO INTERNAL VALIDITY:
There are 12 threats to internal validity. Most of these threats can be prevented with a good experiment design and only occur in the so-called ‘really bad experiment’, also known as the one-group, pre-test/post-test design. The following twelve threats to internal validity exists:

Threat	What happens?	When?	Solution
Maturation threat	A change in behaviour occurs more or less spontaneously over time. People adapt to changed environments.	One-group, pre-test/post-test design	Using a comparison group
History threat	A specific event has occurred between the pre-test and the post-test that affects almost every participant systematically (e.g: a change of seasons).	One-group, pre-test/post-test design	Using a comparison group
Regression threat	If a group’s mean is unusually extreme at the pre-test, it is likely to be less extreme at the post-test, closer to the typical mean (e.g: depressed people have an extreme mean of sadness and this probably will be less extreme when it is tested again). Regression alone does not make an extreme group cross over the mean to the other extreme.	One-group, pre-test/post-test design	Using a comparison group
Attrition threat	A reduction in participant numbers that occurs when people drop out before the end. This is only a problem if attrition is systematic.	One-group, pre-test/post-test design

Access:

JoHo members

Research methods in psychology by B. Morling (third edition) – Chapter 12 summary

EXPERIMENTS WITH TWO INDEPENDENT VARIABLES CAN SHOW INTERACTIONS
Experiments with more than one independent variable allows researchers to look for an interaction effect. This is an effect where the effect of the original independent variable depends on the level of another independent variable. If the two lines of the independent variables cross, there is a crossover interaction, also known as “it depends”. If the lines are not parallel, there is an interaction and if the lines are parallel, there is no interaction. A spreading interaction occurs when the two lines spread out and can be labelled as an “only when..” interaction. An interaction is a difference in differences

FACTORIAL DESIGNS STUDY TWO INDEPENDENT VARIABLES
Testing for interactions is done with factorial designs. A factorial design is one in which there are two or more independent variables. In a factorial design, researchers study each possible combination of the independent variables. A participant variable is a variable whose levels are selected, but cannot be manipulated (e.g: age, the level for this variable can be selected, but not manipulated). Using factorial designs to test limits is called testing for moderators and it is a way to test the external validity of an experiment. Factorial designs can also test theories and hypotheses.

INTERPRETING FACTORIAL RESULTS: MAIN EFFECTS AND INTERACTONS
Researchers test each independent variable to look for main effects, the overall effect of one independent variable on another independent variable. Marginal means are the arithmetic means for each level of an independent variable, averaging over levels of the other independent variable. The main effect is not the most important effect, but the overall effect of one independent variable on another independent variable. The interaction itself is the most important effect. In a factorial design with two independent variables, the first to results obtained are the main effects for each independent variable. The third result is the interaction effect.

FACTORIAL VARIATIONS
In a mixed factorial design, one variable is manipulated as independent groups and the other is manipulated as within-groups (e.g: age and driving while on the phone. Age is independent groups and driving while on the phone is within-groups). When plotting a three-way factorial design and you want to check for three-way-interactions, you have to look for differences between the two states. If the lines are the same for both states in the three-way interaction, then there is a two-way interaction, but not a three-way interaction (unless the lines are parallel).

IDENTIFYING FACTORIAL DESIGNS IN YOUR READING
When looking for factorial designs in research articles it is important to look at the method part of the research description. When looking for factorial designs in regular articles it is important to look for the phrases it depends and only when.

Access:

JoHo members

Research methods in psychology by B. Morling (third edition) – Chapter 13 summary

QUASI-EXPERIMENTS
A quasi-experiment differs from a true experiment in that the researchers do not have full experimental control. In quasi-experiments, researchers might not be able to randomly assign participants to one level or the other. They are assigned by other things (e.g: teachers, political regulations or nature).

A non-equivalent control group design is a quasi-experiment in which there is a treatment group and a control group, but the participants have not been randomly assigned. A non-equivalent control group pretest/posttest design is a quasi-experiment in which participants are tested before and after the experiment, but are not randomly assigned to groups. An interrupted time-series design is a quasi-experiment that measures participants repeatedly on a dependent variable. A non-equivalent control group interrupted time-series design is a quasi-experiment in which the independent variable was studied as a repeated-measures variable and an independent groups variable.

There are several possible threats in quasi-experiments to internal validity:

Threat	Definition
Selection effect	The participants of one level of the independent variable are systematically different from other participants at another level of the independent variable.
Design confounds	In a design confound, some outside variable systematically varies the levels of the targeted independent variable.
Maturation threat	An observed change has emerged more or less spontaneously over time.
History threat	An external, historical event happens for everyone in a study at the same time as the treatment (e.g: a change of seasons).
Regression to the mean	A measure is extreme and will thus (almost) always be less extreme and more closely to the mean on the next measurement.
Attrition threat	Certain kinds of participants drop out systematically (e.g: only the most depressed people drop out).

Access:

JoHo members

Research methods in psychology by B. Morling (third edition) – Chapter 14 summary

TO BE IMPORTANT, A STUDY MUST BE REPLICATED
Replication gives a study credibility, and it is a crucial part of the scientific process. There are several types of replication:

Direct replication
Researchers repeat an original study as closely as they can to see whether the effect is the same in the newly collected data.
Conceptual replication
Researchers explore the same research question, but use different procedures. In this replication, the conceptual variables are the same, but the operationalizations are not.
Replication-plus-extension
Researchers replicate their original experiment and add variables to test additional questions.

The replication crisis refers to the fact that a lot of psychological studies don’t share the same results when they’re replicated. Replication studies might fail, because some original effect are contextually sensitive and when the replication context is too different, the replication is more likely to fail.

HARK-ing is hypothesising after the results are known. P-hacking is using more individuals and removing certain outliers if the results of the first experiment were not significant. The goal of this to find a p-value of under 0.05. There are three changes made to psychological research in order to increase the replication rate:

Open science
Sharing one’s data and materials freely.
Larger sample sizes
Most studies and replications require much larger sample sizes nowadays.
Preregistration
Preregistering the study’s methods, hypothesis and statistical analyses online, in advance of data collection. This can be useful for publication in journals.

In order to increase the replication rate in journals, journals now all devote a section to replicated articles. Meta-analysis is a way of mathematically averaging the results of all the studies that have tested the same variables to see what conclusion the whole body of evidence supports. This makes use of both published and unpublished articles. The file drawer problem refers to the idea that a meta-analysis might be overestimating the true size of an effect because null effects, or even opposite effects, have not been included in the collection of the process (unpublished studies are less likely to make it into a meta-analysis).

TO BE IMPORTANT, MUST A STUDY HAVE EXTERNAL VALIDITY?
The manner in which the participants are recruited is more important than the number of participants for getting external validity. Ecological validity is the generalizability of an experiment to real-world settings.

Researchers in the theory-testing mode are usually designing correlational or experimental research to investigate support for a theory. When investigating support for a theory, the generalizability is not always necessary (e.g: if a theory is false in one sample, it should be false in all samples). Researchers in the generalization mode want to generalize the findings from the sample in a previous study to a larger population. Frequency claims are always in the generalization mode and association and causal claims are usually in theory-testing mode, but can be in generalization mode. Many

Access:

JoHo members

How to use this summary?

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Check more of this topic?

Psychologie en gedrag

Check all content related to:

Learn & Study

Universiteit Amsterdam: UVA

Psychologie en gedrag

How to use more summaries?

Online access to all summaries, study notes en practice exams
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
Quicklinks to fields of study (main tags and taxonomy terms)

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
Use the menu above every page to go to one of the main starting pages
Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study (main tags and taxonomy terms)

Field of study

Check other studie fields?

Main study and working fields

Access level of this page

Public
WorldSupporters only
JoHo members
Private

Statistics

1866

Comments, Compliments & Kudos:

Add new contribution

Promotions

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

Je vertrek voorbereiden of je verzekering afsluiten bij studie, stage of onderzoek in het buitenland

Study or work abroad? check your insurance options with The JoHo Foundation

More contributions of WorldSupporter author: JesperN:

Follow the author: JesperN

JesperN