Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Image

How do you perform significance tests? – Chapter 6

6.1 What are the five components of a significance test?

A hypothesis is a prediction that a parameter within the population has a certain value or falls within a certain interval. A distinction can be made between two kinds of hypotheses. A null hypothesis (H0) is the assumption that a parameter will assume a certain value. Opposite is the alternative hypothesis (Ha), the assumption that the parameter falls in a range outside of that value. Usually the null hypothesis means no effect. A significance test (also called hypothesis test or test) finds if enough material exists to support the alternative hypothesis. A significance test compares point estimates of parameters with the expected values of the null hypothesis.

Significance tests consist of five parts:

  • Assumption. Each test makes assumptions about the type of data (quantitative/categorical), the required level of randomization, the population distribution (for instance the normal distribution) and the sample size.

  • Hypotheses. Each test has a null hypothesis and an alternative hypothesis.

  • Test statistic. This indicates how far the estimate lies from the parameter value of H0. Often, this is shown by the number of standard errors between the estimate and the value of H0.

  • P-value. This gives the weight of evidence against H0. The smaller the P-value is, the more evidence that H0 is incorrect and that Ha is correct.

  • Conclusion. This is an interpretation of the P-value and a decision on whether H0 should be accepted or rejected.

6.2 How do you perform a significance test for a mean?

Significance tests for quantitative variables usually research the population mean µ. The five parts of a significance test come to play here.

Assumed is that the data is retrieved from a random sample and it has the normal distribution.

The hypothesis is two-sided, meaning that both a null hypothesis and an alternative hypothesis exist. Usually the null hypothesis is H0: µ = µ0 , in which µ0 is the value of the population mean. This hypothesis says that there is no effect (0). The alternative hypothesis then contains all other values and looks like this: Ha: µ ≠ µ0.

The test statistic is the t-score. The formula is as follows:

t = (ȳ – µ0) / se in which se = .
The sample mean ȳ estimates the population mean μ. If H0 is true, then the mean of the distribution of ȳ equals the value of µ0 (and lies in the middle of the distribution of ȳ). A value of ȳ far in the tail of the distribution gives strong evidence against H0. The further ȳ is from µ0 then the bigger is the t-score and the bigger is the evidence against H0.

The P-value indicates how extreme the existing data would be if H0 would be true. The probability that this happens, is located in the two tails of the t-distribution. Software can find the P-value.

To draw conclusions, the P-value needs to be interpreted. If the P-value is smaller, the evidence against H0 is stronger.

For two-sided significance tests the conclusions should be the same for the confidence interval and the significance test. This means that when a confidence interval of µ contains H0 the P-value should be bigger than 0.05. When this interval doesn't contain H0 the P-value is smaller than 0.05.

In two-sided tests the region of rejection is in both tails of the normal distribution. In most cases a two-sided test is performed. However, in some cases the researcher already senses in which direction the effect will go, for instance that a particular type of meat will cause people to gain weight. Sometimes it's physically impossible that the effect will take the opposite direction. In these cases a one-sided test can be used, this is an easier way to test a specific idea. A one-sided test has the region of rejection in only one of its tails, which one depends on the alternative hypothesis. If the alternative hypothesis says that there will be weight gain after consumption of a certain product, then the region of rejection is in the right tail. For two-sided tests the alternative hypothesis is Ha: µ ≠ µ0 (so the population mean can be anything but a certain value), for one-sided tests it is Ha: µ > µ0 or Ha: µ < µ0 (so the population mean needs to be either bigger or smaller than a certain value).

All researchers agree that one-sided and two-sided tests are two different things. Some researchers prefer a two-sided test, because it provides more substantial evidence to reject the null hypothesis. Other researchers prefer one-sided tests because they show the outcome of a very specific hypothesis. They say a one-sided test is more sensitive. A tiny effect has a bigger impact on a one-sided test than on a two-sided test. Generally, if the direction of the effect is unknown, two-sided tests are applicable.

The hypotheses are expressed in parameters for the population (such as µ), never in statistics about the sample (such as ȳ), because retrieving information about the population is the end goal.

Usually H0 is rejected when P is smaller or equal to 0.05 or 0.01. This demarcation is called the alpha level or significance level and it is indicated as α. If the alpha level decreases, the research should be more careful and the evidence that the null hypothesis is wrong should be stronger.

Two-sided tests are robust; even when the distribution isn't normal, still confidence intervals and tests using the t-distribution will function. However, significance rests don't work well for one-sided test with a small sample and a very skewed population.

6.3 How do you perform a significance test for a proportion?

Significance tests for proportions work roughly similar like significance tests for means. For categorical variables the sample proportion can help to test the population proportion.

In terms of assumptions, it needs to be stated whether it's a random sample with a normal distribution. If the value of H0 is π 0,50 (this means that the population is divided exactly in half, 50-50%), then the sample size needs to be at least 20.

The null hypothesis says that there is no effect, so H0: π = π0. The alternative hypothesis of a two-sided test contains all other values, Ha: π ≠ π0.

The test statistic for proportions is the z-score. The formula for the z-score used as a test statistic for a significance test of a proportion is:

z = in which 

The z-score measures how many standard errors the sample proportion is from the value of the null hypothesis. This means that the z-score indicates how big the deviation is, how much of the expected effect is observed.

The P-value can be searched with software or found in a table. Also internet apps can find the P-value. The P-value indicates how big is the probability that the observed proportion happens if H0 would be true. For one-sided the probability of a certain value for z is easily found, for two-sided tests the probability needs to be doubled first.

Drawing conclusions works similar for proportion and for means. The smaller the P-value is, the stronger the evidence is against H0. The null hypothesis is rejected when P is bigger than α for an alpha level of around 0,05. Even in case of strong evidence for H0 it will not get accepted by many researchers, they will avoid drawing conclusions are too big and will just 'not reject' H0.

6.4 Which errors can be made in significance tests?

To give people more insight into the findings of a significance test, it's better to give the P-value than to state merely whether the alternative hypothesis was accepted. This is an idea of Fisher. The collection of values for which the null hypothesis is rejected is called the rejection region.

Testing hypotheses is an inferential process. This means that a limited amount of information serves to draw a general conclusions. It's possible that a researcher thinks the null hypothesis should be rejected while the treatment doesn't really have effect. The cause is that samples aren't identical to populations. There can be many parts of a research where an error is created, for instance if an extreme sample happens to be selected. This is called a type I error, when the null hypothesis is rejected while it is true. It can have big consequences. However, there is only a small chane that type I error occurs. The alpha level shows how big is the probability that type I error occurs, usually not exceeding 5%, sometimes limited to 2.5% or 1%. But smaller alpha levels also create the need to find more evidence to reject the null hypothesis.

A type II error occurs when a researcher doesn't reject the null hypothesis while it is wrong, type I error when the null hypothesis is rejected but it is true. If the probability of type I error decreases, the probability of type II error increases.

If P is smaller than 0.05, then H0 is rejected in case of α = 0,05. For type II error the values of µ0 that don't cause H0 to be rejected in the 95% confidence interval.

6.5 Which limitations do significance tests have?

It is important to notice that statistical significance and practical significance are not the same. Finding a significant effect doesn't mean that it's an important find. The size of P simply indicates how much evidence exists against H0, and not how far the parameter lies from H0.

It's misleading to only report research that found significant effects. The same research may have been done 20 times, but only once with a significant effect, which may have been found by coincidence.

A significant effect doesn't say whether a treatment has a big effect. To get a better appreciation of the size of a significant effect, the effect size can be calculated. The difference between the sample mean and the value of the population mean for the null hypothesis (M- µ0) is divided by the population standard deviation. An effect size of 0.2 or less isn't practically significant.

For interpreting the practical consequences of a research the confidence interval is more important than a significance test. Often H0 is only one value while other values might be plausible too. That's why a confidence interval with a spectrum of values gives more information.

Other ways that significance tests can mislead:

  • Sometimes results are only reported when they are regarded as statistically significant.

  • Statistical significance can be coincidence.

  • The P-value is not the probability that H0 is true because it can either be true or false, not something in between.

  • Real effects usually are smaller than the effects in research that gets a lot of attention.

Publication bias is when research with small effects isn't even published.

6.6 How can you calculate the probability of type II error?

A type II error is located in the range of Ha. Every value within Ha has a P(type II error), a probability that type II error occurs. A type II error is calculated using software. The software then creates sampling distributions for the null hypothesis and for the alternative hypothesis and it compares the area where they overlap. The probability of type II error decreases when the parameter value is further away from the null hypothesis, when the sample gets bigger and when the probability of type I error increases.

The power of a test is the probability that the test will reject the null hypothesis when it is wrong. So the power is about finding an effect that is real. The formula for the power of a certain parameter value is: power = 1 – P (type 2 error). If the probability of type II error decreases, the power increases.

6.7 How is the binomial distribution used in significance rests for small samples?

Estimating proportions with small samples is difficult. For the outcome of a small sample with categorical discrete variables, like tossing a coin, a sampling distribution can be made. This is called the binomial distribution. A binomial distribution is only applicable when:

  • Every observation falls within one of two categories.

  • The probabilities are the same for every category.

  • The observations are independent.

The symbol π is the probability of category 1, the symbol x in this case is the binomial variable. The probability of x observations in category 1 is:

The symbol n! Is called n factorial, this is the product of all numbers 1 x 2 x 3 x... x n. The binomial distribution is only symmetrical for π = 0,50. The mean is µ = n π and the standard deviation is

So even for tiny samples of less than 10 observations in each category a significance test can be done, but then the binomial distribution is used as a help. H0 is π = 0,50 and Ha is π < 0,50.

 

Image  Image  Image  Image

Access: 
Public
This content is related to:
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Institutions, jobs and organizations:
WorldSupporter and development goals:
Statistics
1887