Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

What is meant with the 'chilling fear of confounding'?
The skillful interrogation of nature: Why do RCT's work?
What is the new paradigm of confounding?
What does the do-operator and the back-door criterion mean?

The biblical story of Daniel encapsulates in a profound way the conduct of experimental science today. 'When King Nebudchadnezzar brought back thousands of captives, he wanted his followers to pick out the children who were not blemished and skillful in all wisdom. But there was a problem, because his favorite one, the boy named Daniel, refused to touch the food the King gave them out of religious reasons. The followers of the King were terrified because of this problem and how the King would react. But Daniel proposed a experiment; try for ten days giving him only vegetables and take another group of children and feed them the King's meat and wine. After ten days the two groups were compared. Daniel prospered the King's diet, and because of this and their healthy appearance, Daniel became the most important person of the Kingdom'. The followers make up a question about the causation, will a vegetarian diet cause servants to be healthy? Daniel proposes at his turn a methodology to deal with this question by comparing the two groups after ten days of experimenting. And after a suitable amount of time, you can see a difference between the two groups. Nowadays this is called a controlled experiment.

You can't go back in time and see, when Daniel did eat the meat and wine, what will happen to him comparing to the healthy diet. But because you can compare Daniel with a group of people who will get a different treatment, you can see what will happen when you give people a different diet. But the groups need to be representative of the population and comparable with each other.

But Daniel didn't think of one thing; confounding bias. Suppose that Daniel is healthier than the control group to start with, their robust appearance after the ten days of eating the healthy diet will have nothing to do with the diet itself. Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. Sometimes the confounders are known as the 'lurking third variable'.

But, statisticians both over- and underestimate the importance of adjusting for possible confounding variables. They overrate in the sense that they often control for many more variables than they need to and even for variables that they should not control for. The idea is 'the more things you control for, the stronger the your study seems', because it gives the feeling of specificity and precision. But sometimes, you can control for too much.

Statisticians also underestimate the importance of controlling for possible confounding variables in the sense that they are loath to talk about causality at all, even if the controlling has been done correctly.

In this chapter you will get to know why you can safely use RCT's, randomized control trials, to estimate the causal effect X -> Y without falling prey to the confounder bias.

What is meant with the 'chilling fear of confounding'?

In 1998, an important study showed the association between regular walking and reduced death rates among retired men. The researcher wanted to know whether the men who exercised more lived longer. He found that the death rate over a twelve year period was two times higher among men who were a 'casual walker' (less than a mile a day) than among men who were 'intense walkers' (more than two miles a day). But you have to keep in mind the influence a confounding variable or bias might have.

This classic causal diagram shows us that age is a confounder of walking and mortality. But also, maybe, physical condition could be a confounder. But by saying so, you can go on and on about what could be possible confounders. But, even if the researchers adjusted the death rate for age found that the difference between causal and intense walkers was still large.

The skillful interrogation of nature: Why do RCT's work?

An RCT is often considered the gold standard of a clinical trial, and the person to thank for this is R.A. Fisher. The question he asks are 'aimed at establishing causal relationships'. And what gets in the way is confounding. Nature is like a genie that answers exactly the question we pose, not necessarily the one we intend to ask. And around 1923/1924 Fisher began to realize that the only experimental design that the genie could not defeat was a random one. When you do an experiment multiple times, sometimes you may get lucky and apply it tot the most fertile subplots. But by generating a new random assignment each time you perform the experiment, you can guarantee that the great majority of the time you will be neither lucky nor unlucky. Now, the randomized trials are a golden standard, but in the time of Fisher an randomly designed experiment horrified him and all his statistical colleagues. But Fisher realized that an uncertain answer to the right question is much better than a highly certain answer to the wrong question.

When you ask the genie the wrong question, you will never find out what you want to know. If you ask the right question, getting an answer that is occasionally wrong is much less of a problem. So, randomization brings two benefits:

It eliminates the confounder bias.
It enables the researcher to quantify his uncertainty.

In a nonrandomized study, the experimenter must rely on her knowledge of the subject matter. If she is confident that her causal model accounts for sufficient numbers of deconfounders and she has gathered data on them, then she can estimate the effects in an unbiased way. But the danger here is that she might have missed a confounding factor, and her estimate may therefor be biased.

But RCT's are still preferred to observational studies. In some cases, intervention may be physically impossible, or intervention could be unethical, or you have difficulties recruiting subjects for inconvenient experimental procedures and end up with only volunteers who don't quite represent the intended population.

What is the new paradigm of confounding?

While confounding is widely recognized as one of the central problems in research, a review of literature about this will reveal little consistency among the definitions of confounding or confounder. But why has the confounding problem not advanced a bit since Fisher? Because of lacking a principled understanding of confounding, scientist could not say anything meaningful in observational studies where physical control over treatments is infeasible. But how was confounding defined then and how is it defined now? It is easier to answer the second question, with the information we have now. Confounding can simply be defined as anything that leads to a discrepancy between the two P(Y|X) (the conditional probability of the outcome given the treatment) and P(Y |do(X)) (the interventional probability).But why is it so difficult? The difficulty is there because it isn't a statistical notion. It stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using the statistical methods. But if you can't mathematically articulate what you want to assess, you can't expect to define what constitutes a discrepancy.

The concept of 'confounding' has evolved around the two related conceptions: incomparability and lurking third variables. Both of these concepts have resisted formalization. Because how do we know what is interesting and relevant to study and to distinguish and what not? You can say it is common sense, but many scientist have struggled with finding the important things to consider.

What are the two surrogate definitions of confounding? They fall into two main categories: declarative and procedural. A old procedural definition that goes by the scary name of 'noncollapsibilty'. You can compare the relative risk and the relative risk after adjusting for the potential confounder. And the difference indicates confounding and you should use the adjusted risk estimate.

The declarative definition is 'the classic epidemiological definition of confounding' and it consists of three parts: A confounder of X (treatment) and Y (outcome) is a variable Z that is (1) associated with X in the population at large and (2) associated with Y among people who have not been exposed to the treatment X. In the recent years there has been supplemented a third condition: (3) Z should not be on the causal path between X and Y. But this idea is a bit confusing I would say.

You can't always use Z as a perfect measure for M, when you do some of the influence of X on Y might 'leak through' if you control for Z. But controlling for Z is still a mistake, while the bias might be less if you controlled for M, it is still there. That is why Cox (1958) warned that you should only control for Z if you have a 'strong priori reason' to believe that it is not affected by X. This is nothing more than a causal assumption.

Later, Robins and Greenland set out to express their conception of confounding in terms of potential outcomes. Also ideally, each person in their experiment would be exchangeable with the person in the other condition. So, you would have the same person in the treatment and in the control group that confounding variables could be very low. The outcome could be the same if you switched the treatments and controls. By using this idea, Robins and Greenland showed that both the declarative and procedural definition were wrong.

What does the do-operator and the back-door criterion mean?

To understand the back-door criterion you first have to have an idea of how information flows in a causal diagram. It looks like links of pipes that convey information from a starting point X to a finish Y. The do-operator erases all the arrows that come into X and in this way prevents any information about X form flowing in noncausal direction. If you have longer pipes with more junctions:

A F -> G I -> J?

The answer is very simple, if a single junction is blocked, then J cannot 'find out' anything about A through this path. So, you have a lot of options to block communication between A and J. A back-door path is any path from X to Y that starts with an arrow pointing into X, X and Y will be deconfounded if we block every back-door path. So, you can almost treat deconfounding like some game. The goal of the game is to specify a set of variables that will deconfound X and Y. With other words: they should not be descended from X, and they should block all the back-door paths.

This is a new kind of bias, called the M-bias. There is only one back-door path, and this one is already blocked by a collider at B, so you don't need to control for anything else. It is incorrect to call a variable a confounder, like B, merely because it is associated with X and Y. B only becomes a confounder when you control for it! But when you are going to use identifying variables such as smoking, miscarriage etc. they are obviously not games but serious business that you are dealing with.

Access:

Public

Click & Go to more related summaries or chapters:

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Understanding Psychology as a Science - Dienes - 2008 - Article

False-positive psychology: Undiscovered flexibility in data collection and analysis allows presenting anything as significant - Simmons et al. - 2011 - Article

Causal inference and developmental psychology - Foster - 2010 - Article

Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

Critical Thinking in Quasi-Experimentation - Shadish - 2008 - Article

The two disciplines of scientific psychology - Cronbach - 1957 - Article

Simpson’s Paradox in Psychological Science: A Practical Guide - Kievit - 2013 - Article

Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article

Evaluating theories - Dennis & Kintsch - 2008 - Article

Karl Popper and Demarcation - Dienes - 2018 edition - Article

Scaling - Furr & Bacharach - 2014 - Article

Statistical treatment of football numbers - Lord - 1935 - Article

Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice - LeBel & Peters - 2011 - Article

Introduction to qualitative psychological resarch - Coyle - 2015 - Article

Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Article summaries of Scientific & Statistical Reasoning - UvA

Understanding Psychology as a Science - Dienes - 2008 - Article

False-positive psychology: Undiscovered flexibility in data collection and analysis allows presenting anything as significant - Simmons et al. - 2011 - Article

Causal inference and developmental psychology - Foster - 2010 - Article

Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

Critical Thinking in Quasi-Experimentation - Shadish - 2008 - Article

The two disciplines of scientific psychology - Cronbach - 1957 - Article

Simpson’s Paradox in Psychological Science: A Practical Guide - Kievit - 2013 - Article

Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article

Evaluating theories - Dennis & Kintsch - 2008 - Article

Karl Popper and Demarcation - Dienes - 2018 edition - Article

Scaling - Furr & Bacharach - 2014 - Article

Statistical treatment of football numbers - Lord - 1935 - Article

Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice - LeBel & Peters - 2011 - Article

Introduction to qualitative psychological resarch - Coyle - 2015 - Article

Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: Vintage Supporter

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

What is meant with the 'chilling fear of confounding'?

The skillful interrogation of nature: Why do RCT's work?

What is the new paradigm of confounding?

What does the do-operator and the back-door criterion mean?

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Article summaries of Scientific & Statistical Reasoning - UvA

Contributions: posts

Add new contribution

Spotlight: topics

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance