Statistical methods for the social sciences - Agresti - 5th edition, 2018 - Summary (EN)
- 2895 reads
Many scientifical studies research more than two variables, requiring multivariate methods. A lot of research is focussed on the causal relationship between variables, but finding proof of causality is difficult. A relationship that appears causal may be caused by another variable. Statistical control is the method of checking whether an association between variables changes or disappears when the influence of other variables is removed. In a causal relationship, x → y, the explanatory variable x causes the response variable y. This is asymmetrical, because y does not need to cause x.
There are three criteria for a causal relationship:
Association between the variables
Appropriate time order
Elimination of alternative explanations
An association is required for a causal relationship but it doesn't necessitate it. Usually it immediately becomes clear what is a logical time order, such as an explanatory variable preceding a response variable. Apart from x and y, extra variables may provide an alternative explanation. In observational studies it can almost never be proved that a variable causes another variables, this isn't certain. Sometimes there can be outliers or anecdotes that contradict causality, but usually a single anecdote isn't enough proof to contradict causality. It's easier to find causality with randomized experiments than with observational studies. This is because randomization appoints two groups randomly and sets the time frame before starting the experiment.
Eliminating alternative explanations is often tricky. A method of testing the influence of other variables is controlling them; eliminating them or keeping them on a constant value. Controlling means taking care that the control variables (the other variables) don't have an influence anymore on the association between x and y. A random experiment in a way also uses control variables; the subjects are selected randomly and the other variables manifest themselves randomly in the subjects.
Statistical control is different from experimental control. In statistical control, subjects with certain characteristics are grouped together. Observational studies in social science often form groups based on socio-economic status, education or income.
The association between two quantitative variables is shown in a scatter plot. Controlling this association for a categorical variable is done by comparing the means.
The association between two categorical variables is shown in a contingency table. Controlling this association for a third variable is done by showing each value of the third variable in a separate contingency table, called a partial table.
Usually the effect of a control variable isn't completely absent, it's just minimal.
A lurking variable is a variable that isn't measured, but that does influence the causal relationship. Sometimes researchers don't know about the existence of a variable.
In multivariate relationships, the response variable y has multiple explanatory variables and control variables, written as x1, x2, etc.
In spurious associations, both the explanatory variable x1 and the response variable y depend on a third variable (x2),. The association between x1 and y disappears when x2 is controlled. There is no causal relationship between x1 and y. A spurious association looks like this:
In chain relationships the explanatory variable (x1) causes a third variable (x2), that in turn causes the response variable (y). The third variable (x2) is also called the intervening variable or the mediator. Also in chain relationships the association disappears when x2 is controlled:
The difference between a spurious relationship and a chain relationship is the causal order. In a spurious relationship x2 precedes both x1 and y. In a chain relationship x2 intervenes between x1 and y.
In reality, response variables often have more than one cause. Then y is said to have multiple causes. Sometimes these causes are independent, but usually they are connected. That means that for instance x1 has a direct effect on y but also an indirect effect on y via x2. This can look like this:
In case of a suppressor variable, there seems to be no association between x1 and y, until x2 is controlled and disappears. Then x2 is a suppressor variable. This happens when for example x2 is positively correlated with y and negatively correlated with x1. So even when there seems to be no association between two variables, it's wise to control for other variables.
Statistical interaction happens between x1 and x2 and their effect on y when the actual effect of x1 on y changes for different values of x2. The explanatory variables, x1 and x2, are also called predictors.
Lots of structures are possible for multivariate associations. One of the possibilities is even an association that assumes the opposite direction (positive versus negative) when a variable is controlled, this is called Simpson's paradox.
Confounding happens when two explanatory variables both effect a response variable and they're also associated with each other. Omitted variable bias is a risk when a confounding variable is overseen. Finding confounding variables is a big challenge for social science.
When x2 is controlled for the x1y association, this may have consequences for inference. A certain value of x2 can shrink the sample size. The confidence interval becomes wider and the test statistics smaller. The chi squared test can result in a smaller value, caused by the smaller sample size.
When a categorical variable is controlled, separate contingency tables need to be construed for the different categories. It is usual for an ordinal variable to require at least three or four tables.
Often the parameter values are measured for several values of the control variable. Instead of the usual confidence interval to analyze the difference between either proportions or means, a confidence interval can be calculated for the difference in parameters for several values of the control variables. The formula for measuring the effect of statistical control through a confidence interval is:
When 0 isn't within the interval, then the parameter values are different. When the x1y association is equal in the partial analyses, then a measure is designed for the strength of the association, in keeping with the control variable. This is called a partial association.
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Summary of Statistical methods for the social sciences by Agresti, 5th edition, 2018. Summary in English.
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
2019 |
Add new contribution