Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 8 summary

POINT AND INTERVAL ESTIMATES OF POPULATION PARAMETERS
A point estimate is a single number that is our best guess for the parameter (e.g: 25% of all Dutch people are above 1,80m). An interval estimate is an interval of numbers within which the parameter value is believed to fall (e.g: between 20% and 30% of the Dutch people are above 1,80m). The margin of error gives the lower border and the upper border of the margin.

A good estimator of a parameter has two properties:

  1. Unbiased
    A good estimator has a sampling distribution that is centred at the parameter. A mean from a random sample should fall around the population parameter and this is especially the case with multiple samples and thus a sampling distribution.
  2. Small standard deviation
    A good estimator has a small standard deviation compared to other estimators. The sample mean is preferred over the sample median, even in a normal distribution, because the sample mean has a smaller standard deviation.

An interval estimate is designed to contain the parameter with some chosen probability, such as 0.95. Confidence intervals are interval estimates that contain the parameter with a certain degree of confidence. A confidence interval is an interval containing the most believable values for a parameter. The probability that this method produces an interval that contains the parameter is called the confidence level. A sampling distribution of a sample proportion gives the possible values for the sample proportion and their probabilities and is a normal distribution if np is larger than 15 and n(1-p) is larger than 15. The margin of error measures how accurate the point estimate is likely to be in estimating a parameter.

CONSTRUCTING A CONFIDENCE INTERVAL TO ESTIMATE A POPULATION PROPORTION
The point estimate of the population proportion is the sample proportion. The standard error is the estimated standard deviation of a sampling distribution. The formula for the standard error is:

The greater the confidence level, the greater the interval. The margin of error decreases with bigger samples, because the standard error decreases with bigger samples. The larger the sample, the narrower the interval. If using a 95% confidence interval over time, then 95% of the intervals would give correct results, containing the population proportion.

CONSTRUCTING A CONDIFENCE INTERVAL TO ESTIMATE A POPULATION MEAN
The standard error for the population mean has the following formula:

The t-score is like a z-score, but a bit larger, and comes from a bell-shaped distribution that has slightly thicker tails than a normal distribution. The distribution that uses the t-score and the standard error, rather than the z-score and the standard deviation is called the t-distribution. The standard deviation of the t-distribution is a bit larger than 1, with the precise value depending on what is called the degrees of freedom. The t-score has several properties. For inference about the population mean, the degrees of freedom = n – 1. The t-distribution has several properties:

  1. Bell shaped and symmetric about 0
  2. The probabilities depend on the degrees of freedom.
  3. The t-distribution has thicker tails and has more variability than the standard normal distribution.
  4. A t-score multiplied by the standard error gives the margin of error for a confidence interval for the mean.

A statistical method is robust with respect to a particular assumption if it performs adequately even when that assumption is modestly violated (e.g: the t-distribution works adequately, even when we can’t assume that something is normally distributed and has a small sample). The t-confidence-interval method does not work well when the data contains extreme outliers. The t-score is similar to the z-score if the degrees of freedom is larger than 30.

CHOOSING THE SAMPLE SIZE FOR A STUDY
The margin of error depends on the standard error of the sampling distribution of the point estimate and the standard error itself depends on the sample size. The formula for the desired sample size with a population proportion is:

In this formula, m is the margin of error and z is based on the desired confidence level (e.g: a confidence level of 95% requires z = 1.96). can be guessed, based on other information or to be safe, = 0.5 can be used. The smaller the margin of error, the larger the sample size has to be. The formula for the desired sample size with a population mean is:

In this formula, m is the margin of error and z is based on the desired confidence level. There are several factors that affect the choice of the sample size:

  1. Desired precision
  2. Desired confidence level
  3. Variability in the data (the larger the standard deviation, the larger the sample)
  4. Cost
     

 

Image

Access: 
Public

Image

Join WorldSupporter!
This content is used in:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 2 (UNIVERSITY OF AMSTERDAM)

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Activities abroad, study fields and working areas:
Institutions, jobs and organizations:
This content is also used in .....

Image

Check how to use summaries on WorldSupporter.org
Submenu: Summaries & Activities
Follow the author: JesperN
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
Search a summary, study help or student organization