Image

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 12 summary

MODEL HOW TWO VARIABLES ARE RELATED
A regression line is a straight line that predicts the value of a response variable ‘y’ from the value of an explanatory variable ‘x’. The correlation is a summary measure of association. The regression line uses the following formula:

The data is plotted before a regression line is made, because it can be strongly influenced by outliers. The regression equation is often called a prediction equation. The difference between y - ŷ, between an observed outcome y and its predicted value ŷ is the prediction error, called the residual. The average of the residuals is zero. The regression line has a smaller sum of squared residuals than any other line. It is called the least squares line. The population regression equation has the following formula:

This formula is a model. A model is a simple approximation for how variables relate in a population. The probability distributions of y values at a fixed value of x is a conditional distribution (e.g: the means of annual income for people with 12 years of education).

DESCRIBE STRENGTH OF ASSOCIATION
Correlation does not differentiate between response and explanatory variables. The formula for the slope uses the correlation and can be calculated as following:

Using this formula, the y-intercept can be calculated:

The slope can’t be used to determine the strength of the association, because it determines on the units of measurement. The correlation is the standardized version of the slope. The formula for the correlation is the following:

A property of the correlation is that at any particular x value, the predicated value of y is relatively closer to its mean than x is to its mean. If a particular ‘x’ value falls 2.0 standard deviations from the mean with a correlation of 0.80, then the predicted ‘y’ is ‘r’ times that many standard deviations from its mean, so the predicted ‘y’ would be 0.80 times 2.0 standard deviations from the mean. The predicted ‘y’ is relatively closer to its mean than ‘x’ is to its mean. This is regression toward the mean. If the first observation is extreme, the second observation will be more toward the mean and will be less extreme.

Predicting ‘y’ using ‘x’ with the regression equation is called the residual sum of squares and this uses the following formula:

The measure r squared is interpreted as proportional reduction in error (e.g: if r squared = 0.40, the error using y-hat to predict y is 40% smaller than the error using y-bar to predict y). The formula for r squared is the following:

The r squared can also be calculated by squaring the correlation coefficient. If the regression line does not explain the variability, then the r-squared will be zero. If the regression line does explain the variability, then the r-squared will be greater than zero. The correlation depends on three factors:

  1. Potential outliers
  2. Grouped participants
    If participants are grouped, the correlation tends to be stronger.
  3. The range of the x-values sampled.
    If only a restricted range is sampled, this influences the strength of the correlation.

The ecological fallacy is making predictions about individuals based on the summary results for groups.

MAKE INFERENCES ABOUT THE ASSOCIATION
There are five steps for a significance test using the regression line:

  1. Assumptions
    The data were gathered using randomization. The population values of ‘y’ at each ‘x’ follow a normal distribution, with the same standard deviation at each ‘x’ value.
  2. Hypotheses
                                                                           ≠ 0
  3. Test statistic
  4. P-value
    This is the two-tail probability from the t-distribution. The degrees of freedom are
    df = n -2.
  5. Conclusion
    The null hypothesis is either rejected or not rejected using the given significance level and the found P-value.

A confidence interval can be calculated using the regression line as well. A 95% confidence interval for the slope has the following formula:

 

HOW THE DATA VARY AROUND THE REGRESSION LINE
A residual is a prediction error. A standardized version of the residual does not depend on the units. It equals the residual divided by a standard error that describes the sampling variability of the residuals:

A standardized residual behaves like a z-score, it indicates how many standard errors a residual falls from zero. It is useful to construct a histogram of the residuals to detect unusual observations. The residual standard deviation uses the residual sum of squares. It has the following formula:

The residual standard deviation describes variability at a fixed x-value. If there is a strong correlation, there is less variability at a fixed value of ‘x’ than the overall mean has. The standard error of y-bar can be used to construct a 95% confidence interval for the mean of y.

The t-score has degrees of freedom of n-2. This interval is an inference about where the population mean falls. The interval called the prediction interval for y is an inference about where an individual observations fall. These intervals are only valid if the true relationship is close to linear, with about the same variability of y values at each fixed x-value. The ratio of a sum of squares to its degrees of freedom value is called a mean square. The mean square error is the residual sum of squares divided by it’s degrees of freedom value. It’s square root ‘s’ is a typical size of a residual. Another way of calculating F is the following:

Image  Image  Image  Image

Access: 
Public

Image

This content is used in:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 4 (UNIVERSITY OF AMSTERDAM)

Search a summary

Image

Follow the author: JesperN
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Institutions, jobs and organizations:
This content is also used in .....
Statistics
1779