Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Chapter 12 summary

MODEL HOW TWO VARIABLES ARE RELATED
A regression line is a straight line that predicts the value of a response variable ‘y’ from the value of an explanatory variable ‘x’. The correlation is a summary measure of association. The regression line uses the following formula:

The data is plotted before a regression line is made, because it can be strongly influenced by outliers. The regression equation is often called a prediction equation. The difference between y - ŷ, between an observed outcome y and its predicted value ŷ is the prediction error, called the residual. The average of the residuals is zero. The regression line has a smaller sum of squared residuals than any other line. It is called the least squares line. The population regression equation has the following formula:

This formula is a model. A model is a simple approximation for how variables relate in a population. The probability distributions of y values at a fixed value of x is a conditional distribution (e.g: the means of annual income for people with 12 years of education).

DESCRIBE STRENGTH OF ASSOCIATION
Correlation does not differentiate between response and explanatory variables. The formula for the slope uses the correlation and can be calculated as following:

Using this formula, the y-intercept can be calculated:

The slope can’t be used to determine the strength of the association, because it determines on the units of measurement. The correlation is the standardized version of the slope. The formula for the correlation is the following:

A property of the correlation is that at any particular x value, the predicated value of y is relatively closer to its mean than x is to its mean. If a particular ‘x’ value falls 2.0 standard deviations from the mean with a correlation of 0.80, then the predicted ‘y’ is ‘r’ times that many standard deviations from its mean, so the predicted ‘y’ would be 0.80 times 2.0 standard deviations from the mean. The predicted ‘y’ is relatively closer to its mean than ‘x’ is to its mean. This is regression toward the mean. If the first observation is extreme, the second observation will be more toward the mean and will be less extreme.

Predicting ‘y’ using ‘x’ with the regression equation is called the residual sum of squares and this uses the following formula:

The measure r squared is interpreted as proportional reduction in error (e.g: if r squared = 0.40, the error using y-hat to predict y is 40% smaller than the error using y-bar to predict y). The formula for r squared is the following:

The r squared can also be calculated by squaring the correlation coefficient. If the regression line does not explain the variability, then the r-squared will be zero. If the regression line does explain the variability, then the r-squared will be greater than zero. The correlation depends on three factors:

  1. Potential outliers
  2. Grouped participants
    If participants are grouped, the correlation tends to be stronger.
  3. The range of the x-values sampled.
    If only a restricted range is sampled, this influences the strength of the correlation.

The ecological fallacy is making predictions about individuals based on the summary results for groups.

MAKE INFERENCES ABOUT THE ASSOCIATION
There are five steps for a significance test using the regression line:

  1. Assumptions
    The data were gathered using randomization. The population values of ‘y’ at each ‘x’ follow a normal distribution, with the same standard deviation at each ‘x’ value.
  2. Hypotheses
                                                                           ≠ 0
  3. Test statistic
  4. P-value
    This is the two-tail probability from the t-distribution. The degrees of freedom are
    df = n -2.
  5. Conclusion
    The null hypothesis is either rejected or not rejected using the given significance level and the found P-value.

A confidence interval can be calculated using the regression line as well. A 95% confidence interval for the slope has the following formula:

 

HOW THE DATA VARY AROUND THE REGRESSION LINE
A residual is a prediction error. A standardized version of the residual does not depend on the units. It equals the residual divided by a standard error that describes the sampling variability of the residuals:

A standardized residual behaves like a z-score, it indicates how many standard errors a residual falls from zero. It is useful to construct a histogram of the residuals to detect unusual observations. The residual standard deviation uses the residual sum of squares. It has the following formula:

The residual standard deviation describes variability at a fixed x-value. If there is a strong correlation, there is less variability at a fixed value of ‘x’ than the overall mean has. The standard error of y-bar can be used to construct a 95% confidence interval for the mean of y.

The t-score has degrees of freedom of n-2. This interval is an inference about where the population mean falls. The interval called the prediction interval for y is an inference about where an individual observations fall. These intervals are only valid if the true relationship is close to linear, with about the same variability of y values at each fixed x-value. The ratio of a sum of squares to its degrees of freedom value is called a mean square. The mean square error is the residual sum of squares divided by it’s degrees of freedom value. It’s square root ‘s’ is a typical size of a residual. Another way of calculating F is the following:

Image

Access: 
Public

Image

Join WorldSupporter!
This content is used in:

Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary

Research Methods & Statistics – Interim exam 4 (UNIVERSITY OF AMSTERDAM)

Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activities abroad, study fields and working areas:
This content is also used in .....

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: JesperN
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
1865