Statistics, the art and science of learning from data by A. Agresti (fourth edition) – Book summary
- 2102 reads
MODEL HOW TWO VARIABLES ARE RELATED
A regression line is a straight line that predicts the value of a response variable ‘y’ from the value of an explanatory variable ‘x’. The correlation is a summary measure of association. The regression line uses the following formula:
The data is plotted before a regression line is made, because it can be strongly influenced by outliers. The regression equation is often called a prediction equation. The difference between y - ŷ, between an observed outcome y and its predicted value ŷ is the prediction error, called the residual. The average of the residuals is zero. The regression line has a smaller sum of squared residuals than any other line. It is called the least squares line. The population regression equation has the following formula:
This formula is a model. A model is a simple approximation for how variables relate in a population. The probability distributions of y values at a fixed value of x is a conditional distribution (e.g: the means of annual income for people with 12 years of education).
DESCRIBE STRENGTH OF ASSOCIATION
Correlation does not differentiate between response and explanatory variables. The formula for the slope uses the correlation and can be calculated as following:
Using this formula, the y-intercept can be calculated:
The slope can’t be used to determine the strength of the association, because it determines on the units of measurement. The correlation is the standardized version of the slope. The formula for the correlation is the following:
A property of the correlation is that at any particular x value, the predicated value of y is relatively closer to its mean than x is to its mean. If a particular ‘x’ value falls 2.0 standard deviations from the mean with a correlation of 0.80, then the predicted ‘y’ is ‘r’ times that many standard deviations from its mean, so the predicted ‘y’ would be 0.80 times 2.0 standard deviations from the mean. The predicted ‘y’ is relatively closer to its mean than ‘x’ is to its mean. This is regression toward the mean. If the first observation is extreme, the second observation will be more toward the mean and will be less extreme.
Predicting ‘y’ using ‘x’ with the regression equation is called the residual sum of squares and this uses the following formula:
The measure r squared is interpreted as proportional reduction in error (e.g: if r squared = 0.40, the error using y-hat to predict y is 40% smaller than the error using y-bar to predict y). The formula for r squared is the following:
The r squared can also be calculated by squaring the correlation coefficient. If the regression line does not explain the variability, then the r-squared will be zero. If the regression line does explain the variability, then the r-squared will be greater than zero. The correlation depends on three factors:
The ecological fallacy is making predictions about individuals based on the summary results for groups.
MAKE INFERENCES ABOUT THE ASSOCIATION
There are five steps for a significance test using the regression line:
A confidence interval can be calculated using the regression line as well. A 95% confidence interval for the slope has the following formula:
HOW THE DATA VARY AROUND THE REGRESSION LINE
A residual is a prediction error. A standardized version of the residual does not depend on the units. It equals the residual divided by a standard error that describes the sampling variability of the residuals:
A standardized residual behaves like a z-score, it indicates how many standard errors a residual falls from zero. It is useful to construct a histogram of the residuals to detect unusual observations. The residual standard deviation uses the residual sum of squares. It has the following formula:
The residual standard deviation describes variability at a fixed x-value. If there is a strong correlation, there is less variability at a fixed value of ‘x’ than the overall mean has. The standard error of y-bar can be used to construct a 95% confidence interval for the mean of y.
The t-score has degrees of freedom of n-2. This interval is an inference about where the population mean falls. The interval called the prediction interval for y is an inference about where an individual observations fall. These intervals are only valid if the true relationship is close to linear, with about the same variability of y values at each fixed x-value. The ratio of a sum of squares to its degrees of freedom value is called a mean square. The mean square error is the residual sum of squares divided by it’s degrees of freedom value. It’s square root ‘s’ is a typical size of a residual. Another way of calculating F is the following:
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
This bundle contains a full summary for the book "Statistics, the art and science of learning from data by A. Agresti (third edition". It contains the following chapters:
1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15.
This bundle contains a summary for the fourth interim exam of the course "Research Methods & Statistics" given at the University of Amsterdam. It contains the books: "Statistics, the art and science of learning from data by A. Agresti (third edition)" with the chapters
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
1825 |
Add new contribution