Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Image

How can you make estimates for statistical inference? – Chapter 5

5.1 How do you make point estimates and interval estimates?

Sample data is used for estimating parameters that give information about the population, such as proportions and means. For quantitative variables the population mean is estimated (like how much money on average is spent on medicine in a certain year). For categorical variables the population proportions are estimated for the categories (like how many people do and don't have medical insurance in a certain year).

Two kinds of parameter estimates exist;

  • A point estimate is a number that is the best prediction.

  • An interval estimate is an interval surrounding a point estimate, which you think contains the population parameter.

There is a difference between the estimator (the way that estimates are made) and the estimate point (the estimated number itself). For instance, a sample is an estimator for the population parameter and 0,73 is an estimate point of the population proportion that believes in love at first sight.

A good estimator has a sampling distribution that is centered around the parameter and that has a standard error as small as possible.

An estimate isn't biased when the sampling distribution is centered around the parameter. This is especially the case when the sample mean is the population parameter. In that case ӯ (sample mean) equals µ (population mean). ӯ is then regarded a good estimator for µ.

When an estimate is biased, the sample mean doesn't estimate the population mean well. Usually the sample mean is below, because the extremes from a sample can't be more than those of the population, only less. The sample variety is smaller, allowing the sample variety to underestimate the population variety.

An estimator should also have a small standard error. An estimator is called efficient when the standard error is smaller than that of other estimator. Imagine a normal distribution. The standard error of the median is 25% bigger than the standard error of the mean. The sample mean is closer to the population mean than the sample median is. The sample mean is a more efficient estimator then.

A good estimator is unbiased (meaning the sampling distribution is centered around the parameter) and efficient (meaning it has the smallest standard error).

Usually the sample mean serves as an estimator for the population mean, the sample standard deviation as an estimator for the population standard deviation, etc. This is indicated by a hat on a symbol, for instance(mu-hat) means an estimate of the population mean µ.

A confidence interval is an interval estimate for a parameter. Only reliable estimates of the parameter are in this interval. To find this interval, look at the sample distribution, which is a normal distribution. For a confidence interval with 95% security, the estimate of the parameter is within two standard errors of the mean. To calculate this, multiply the standard error with the z-score. Ad and subtract the outcome to the point estimate, so you get two numbers, that together form the confidence interval. Now it is 95% guaranteed that a population parameter lies in between these two numbers. The z-score multiplied with the standard error is also called the margin of error.

So a confidence interval is: point estimate ± margin of error. The confidence level is the chance that the parameter really falls within the confidence interval. This is a number close to 1, like 0.95 or 0.99.

5.2 How do you calculate the confidence level for a proportion?

Nominal and ordinal variables create categorical data (for instance 'agree' and 'not agree'). For this kind of data, means are useless. Instead, proportions or percentages are used. A proportion is between 0 and 1, a percentage between 0 and 100.

The unknown population proportion is written: π. The sample proportion is the point estimate of the population proportion, meaning the sample is used to estimated the population proportion. The sample proportion is indicated by the symbol mu hat.

A sample mean is a statistic from the sample so its distribution has the normal shape. Hence, the Central Limit Theorem is in place. Because it is a normal distribution, 95% falls within two standard deviations from the mean. This is regarded as the confidence interval. Calculating a confidence interval requires the standard error, but because this is often unknown for the population, the sample standard error is used instead. This is indicated as se. The formula for estimating the sample standard error is:

 se

The standard error needs to be multiplied with the z-score. For a normal distribution the chance of z standard errors from the mean is the same as the confidence level. For confidence intervals of 95% and 99%, the z equals 1.96 and 2.58. A 95% confidence interval for the proportion π is:

pi hat ± 1,96(se)

The general formula for a confidence interval is:

pi hat ± z(se)

Confidence intervals are rounded off at two numbers.

A bigger sample generates a smaller standard error and a more accurate confidence interval. Specifically, the sample size needs to multiply by four to double the accuracy of the confidence interval.

The error probability is the chance that the parameter is outside of the estimated confidence interval. This is indicated as α (the Greek letter alpha), it is calculated as 1 – confidence level. If the confidence level is 0.98, then the error probability is 0.02.

When the sample is too small, the confidence interval doesn't say much because the error probability is too big. As a rule, at least 15 observations should fall within a category and at least 15 outside.

5.3 How do you calculate the confidence level for a mean?

Finding the confidence interval for a mean goes roughly the same way as finding it for a proportion. Also for a mean the confidence interval is point estimate ± margin of error. In this case the margin of error consists of a t-score (instead of a z-score) multiplied with the standard error. The t-score is retrieved from the t-distribution, a distribution of the confidence intervals for all sample sizes, even tiny ones. The standard error is found by dividing the sample standard deviation s by the root of the sample size n. In this case the point estimate is the sample mean ȳ.

The formula for a 95% confidence interval for a population mean µ using the t-distribution is:

ȳ ± t0.025 (se) where se = se short and df = n – 1

For t-scores the confidence interval is a little wider than it normally is. The t-distribution looks like a normal distribution but it rises less high in the middle and its tails are a bit higher. It's symmetrical from the middle, where the mean 0 lies.

The standard deviation of the t-distribution is dependent on the degrees of freedom (df). With that, the standard deviation of the t-distribution is a bit bigger than 1. The formula for the degrees of freedom is: df = n – 1.

The bigger the degrees of freedom, the more the t-distribution looks like a normal distribution. It gets pointier. For df > 30 they are practically identical.

The t-scores can be found on the internet or in books about statistics. For instance, a 95% confidence interval has a t-score t0.025.

Robust means that a statistical method will hold even when a certain assumption is violated. Even for a distribution that isn't normal, the t-distribution can give a mean for a confidence level. However, for extreme outliers or very skewed distributions, this method doesn't work properly.

A standard normal distribution is a distribution with degrees of freedom that are infinite.

The t-distribution was discovered by Gosset while doing research for a brewery. He secretly published articles using Student as a name. Now, sometimes the t-distribution is named Student's t.

5.4 How do you choose the sample size?

For determining sample size, the desired margin of error and the desired confidence level need to be decided upon. The desired margin of error is indicated as M.

The formula for finding the right sample size to estimate a population proportion is:

 n and M

The z-score corresponds with the one for the chosen confidence interval, like 1.96. The z-score is determined by the chance that the margin of error isn't bigger than M. The sample proportion π can be guessed or can be estimated safely at 0,50.

The formula for finding the right sample size to estimate a population mean is:

 n and sigma

Also here the z-score belongs to the chosen confidence level, like z = 1.96 for 0.95. The standard deviation of the population σ needs to be guessed.

The desired sample size depends on the margin of error and on the confidence level, but also on variability. Data with high variability requires a bigger sample size.

Also other factors influence choosing a sample size. The more complex the analysis and the more variables are relevant, the bigger the sample needs to be. Also time and money influence things. If it's unavoidable for a sample to be small, then for each category two fake observations are added, so that the formulas for the confidence interval remain useful.

5.5 What do maximum likelihood and bootstrap methods do?

Apart from means and proportions, also other statistics can describe data. To make point estimates, also for other statistics, R.A. Fisher developed a method called maximum likelihood. This method chooses the estimator of the parameter for which the likelihood is maximal. The likelihood can be portrayed like a curve, so visually it immediately becomes clear where the highest point of likelihood is located. The chance for finding a sample outcome with a certain value for a parameter shows how likely a parameter value is.

This method has three advantages, especially for big samples: 1) efficiency, other estimators don't have smaller standard errors or are closer to the parameter, 2) unbiased and 3) usually shaped like a normal distribution.

Fisher discovered that the mean is a more likely estimator than the median. Only for exceptions the median is better, like for very skewed data.

When even the shape of a population distribution is unknown, the bootstrap method can help. Software then treats the sample as if it were the population distribution and generates a new 'sample', this process is repeated many times. In this way, the bootstrap method can find the standard error and the confidence interval.

Image  Image  Image  Image

Access: 
Public
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Institutions, jobs and organizations:
WorldSupporter and development goals:
Statistics
1709