Fearing the future of empirical psychology - summary of an article by LeBel & Peters (2011)

Critical thinking
Article: LeBel & Peters (2011)
Fearing the future of empirical psychology


The interpretation bias

Because empirical data undermine theory choice, alternative explanations of data are always possible, both when the data statistically support the researcher’s hypothesis and when they fail to do so.

The interpretation bias: a bias toward interpretations of data that favour a researcher’s theory, both when the null hypothesis is statistically rejected and when not.
This bias entails that, regardless of how data turn out, the theory whose predictions are being tested is artificially buffered from falsification.
The ultimate consequence is an increased risk of reporting false positives and disregarding true negatives, and so drawing incorrect conclusions about human psychology.

The research bias underlying the file-drawer problem in no way depend on unscrupulous motives.

Conservatism in theory choice

The knowledge system that constitutes a science such as psychology can be roughly divided into two types of belief:

  • Theory-relevant beliefs
    Concern the theoretical mechanisms that produce behaviour
  • Method-relevant beliefs
    Concern the procedures through which data are produced, measured and analysed

In any empirical test of a hypothesis, interpretation of the resulting data depends on both theory-relevant and method-relevant beliefs, as both types of belief are required to bring the hypothesis to empirical test.
Consequently, the resulting data can always be interpreted as theory relevant or as method relevant.

Weaknesses in the current knowledge system of empirical psychology bias the resulting choice of interpretation in favour of the researcher’s theory.
Deficiencies in methodological research practice systematically bias

  • The interpretation of confirmatory data as theory relevant
  • The interpretation of disconfirmatory data as method relevant

This has the result that the researcher’s hypothesis is artificially buffered from falsification.

The interpretation of data should hinge not on what the pertinent beliefs are about, but rather on the centrality of those beliefs.
The centrality of belief reflects its position within the knowledge system: central beliefs are those on which many other beliefs depend. Peripheral beliefs are those with few dependent beliefs.
The rejection of central beliefs to account for observed data entails a major restructuring of the overall knowledge system.

Conservatism: choosing the theoretical explanation consistent with the data that requires the least amount of restructuring of the existing knowledge system.
Generally, the conservatism in theory choice is a virtue, as it reduces ambiguity in the interpretation of data.
The value of methodological rigour is precisely that, by leveraging conservatism, it becomes more difficult to blame negative results on flawed methodology.
When method-relevant beliefs are peripheral and easily rejected, empirical tests become more ambiguous.

Theory-relevant beliefs should not be so central that they approach the status of logical necessity.
A theory’s strength should be measured by the extent to which it is falsifiable.
Theories that are too central risk becoming logical assumptions that are near impossible to dislodge with empirical tests.
It is critical that a hypothesis under test be described in a way that makes it empirically falsifiable and not logically necessary.

The knowledge system in empirical psychology is such that conservatism becomes a vice rather than a virtue in theory choice.

  • On the one hand, method-relevant beliefs are too peripheral, making them easy to reject
    This increases the ambiguity of negative results, which contributes directly to the file drawer problem.
  • On the other hand, theory-relevant beliefs often appear too central, making them difficult to reject.
    This leads to a process of confirmatory hypothesis testing, exacerbating the file drawer problem.

Deficiencies in MRP

Overemphasis on conceptual replication

The exclusive focus on conceptual replication is in keeping with the ethos of continuous theoretical advancement that is a hallmark of MPS.
An overemphasis on conceptual replication at the expense of close replication, however, weakens method-relevant beliefs in the knowledge system of empirical psychology, with the result that reports consisting entirely of conceptual replications may be less rigorous than those including a judicious number of close replications.

Typically in MRP, a statistical significant result is followed by a conceptual replication in the interest of extending the underlying theory.
The problem with this practice is that when the conceptual replication fails, it remains unclear whether the negative result was due to the falsity of the underlying theory or to methodological flaws introduced by changes in the conceptual replication.
Given the original statistical significant finding, the natural preference is to choose the latter interpretation and to proceed with another, slightly different, conceptual replication.

Danger arises because conceptual replication allows the researcher too much latitude in the interpretation of negative results.

  • In particular, the choice of which studies count as replications is made post hoc, and these choices are inevitably influenced by the interpretation bias: an extension that fails to reject the null hypothesis is not counted as a replication precisely because it did not replicate the original finding and therefore, the altered methodology must be to blame.
    • The consequence is that a successful extension becomes a conceptual replication, whereas a failed extension becomes a methodological flawed pilot study, and it is tacitly understood that failed pilot studies belong in the file drawer.

Integrity of measurement instruments and experimental procedures

Failure to verify the integrity of measurement instruments and experimental procedures directly weakens method-relevant beliefs and thus increases ambiguity in the interpretation of negative (and even positive) results.

Little effort is put into independently validating and calibrating methodological procedures in MRP outside of main theory-testing experiments. Instead, experimenters are required to verify procedures and test psychological theories simultaneously. The result is that it becomes easy to attribute negative results to methodological flaws and hence relegate them to the file drawer.

Although pilot studies confirming the operation of construct manipulations are sometimes reported in multi-experiments articles, such verification studies are not consistently performed give that they are not required for publication.

The integrity of measurement procedures is also often difficult to substantiate. Because of the small cell sizes typically used in experimental designs, it is often impossible to determine accurate reliability estimates of test scores within experimental conditions.
Even when reliability can be accurately estimated, this methodological check is only the tip of the iceberg in determining whether observed scores primarily reflect the construct of interest rather than some other construct.

Taken together, the inconsistent, informal, and arduous nature of verifying the integrity of manipulation and measurement procedures leaves method-relevant beliefs much weaker than required for a rigorous empirical science.

Problems with null hypothesis significance testing

The exclusive reliance on the number .05 is problematic because:

  • The standard null hypothesis of no difference will almost always be false
  • It divorces theory choice form the context of the broader scientific knowledge system, encouraging myopic interpretations of data the can lead to bizarre conclusions about what has been empirically demonstrated.

Although it is well known that negative (null) results are ambiguous and difficult to interpret, exclusive reliance on NHST makes positive results equally ambiguous, because they can be explained by flaws in the way NHST is implemented rather than by a more theoretically interesting mechanism.
In this way, exclusive reliance on NHST increases the ambiguity of theory choice and undermines the rigour of empirical psychology.

The first problem:

  • In MRP, the null hypothesis is often formulated as a ‘nil hypothesis’, which claims that the means of different populations are identical.
  • This is a weak hypothesis because it is almost by definition false. Differences between different populations are inevitable, even if they only reflect ambient noise.
  • The statistical rejection of the nil hypothesis is therefore contingent only on a sample size sufficient to make the difference between means statistically significant.

The nil hypothesis is a straw man.

  • Because the nil hypothesis is not theory driven, it is hard to argue that its rejection implies anything whatsoever about the choice of alternative hypothesis.
  • The rejection of the nil is not equivalent to the rejection of a theoretically appropriate null hypothesis, and assuming that it is leads to the inflation of Type I error.

Second problem

  • Treating statistical significance as the sole criterion of theory choice when interpreting new data ignores all other evidence relevant to the interpretation of those data.
  • Empirical tests are not conducted in a theoretical vacuum, and existing evidence for or against a hypothesis should be factored into the interpretation of new data to supplement NHST.
  • NHST on its own does not not tell us what we want to know but something much less informative.
  • Basing theory choice on null hypothesis significance tests thus detaches theories from the broader system of empirical psychology.
  • Overreliance on NHST threatens the cumulation of evidence and the coherence of knowledge system in empirical psychology.

The logical strength of theory

Weak, peripheral method-relevant beliefs make it easy to discount negative results.
The more it appears that a theoretical explanation has to be the case, the more likely it is that disconfirming data will be attributed to methodological flaws.

Summary

The result of the combination of peripheral method-relevant beliefs and central theory-relevant beliefs is that conservatism in MRP becomes an unconditional bias toward interpretations of data that favour the researcher’s theory.
Conservatism should only bias theory choice toward interpretations of data that minimize revision of the knowledge system, regardless of whether a particular interpretation favours method-relevant or theory-relevant beliefs.

Strategies of improving MRP

The overarching recommendation is that methodology must be made more rigorous by strengthening method-relevant beliefs to constrain the filed of alternative explanations available for psychological finding.
This is true both when data statistically support a researcher’s theory and when they do not.
By making MRP more rigorous, the ambiguity of theory choice is reduced and empirical tests become more diagnostic.

A complementary recommendation is that the logical status of theory-relevant beliefs must be weakened.

Recommendations for strengthening method-relevant beliefs

Stronger emphasis on close replication

To determine whether an observed effect is real or due to sampling error.
Close replications are crucial because a failed close replication is the most diagnostic test of whether an observed effect is real, given that no differences between the original study and the replicating study were intentionally introduced.

In the case of a close replication, we cannot easily blame a negative result on methodological variation, because in a close replication methodological differences are not deliberately introduced into the replication.

Once successful close replications have been achieved in a new area of research, the value of further close replications diminishes and the value of conceptual replications increases dramatically.

Verify integrity of methodological procedures

To make method-relevant beliefs stronger and more difficult to reject, it is critical that verifying the integrity of empirical instruments and procedures becomes a routine component of psychological research.

Maintaining a clear distinction between pilot studies designed to verify the integrity of instruments and procedures and primary studies designed to test theories will do much to diminish the influence of the interpretation bias on the reporting of results.

It should also be standard procedure to routinely check the internal consistency of the scores of any measurement instruments used and to confirm measurement invariance of instruments across conditions.
It should also be standard practice to use objective markers of instruction comprehension and participant non-compliance.

Use stronger forms of NHST

Minimally, null hypothesis should not be formulated in terms of a nil hypothesis.

In the strong form, NHST requires that the null hypothesis be a theoretically derived point value of the focal variable, which the researcher them attempts of reject on observation of the data.

Significance tests should be treated as just one criterion informing theory choice, in addition to relevant background knowledge and considerations of belief centrality.

Recommendations for weakening theory-relevant beliefs

Considered individually, not all psychological hypotheses appear logically necessary, but insufficient attention has been paid to identify the criterion that distinguishes between falsifiable and non-falsifiable psychological hypotheses.

The important point is that making the dis-confirmation of a psychological hypothesis more plausible will reduce the bias toward methodological interpretations of negative results.
At minimum, care needs to be taken that hypotheses under test are stated such that their not being the case is possible, so that their truth is contingent rather than necessary.

When the researcher’s hypothesis is plausibly falsifiable and the null hypotheses is plausibly confirmable, statistical tests pitting these two hypotheses against each other will be much more informative for theory choice.

Access: 
Public

Image

This content is also used in .....

WSRt, critical thinking - a summary of all articles needed in the second block of second year psychology at the uva

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant - summary of an article by Simmons, Nelson, & Simonsohn (2011)

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant - summary of an article by Simmons, Nelson, & Simonsohn (2011)

Image

Critical thinking
Article: Simmons, Nelson, & Simonsohn (2011)
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant


Abstract

This article is about two things:

  • despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings, flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to false find evidence that an effect exists than to correctly find evidence that it does not.
  • a solution to that problem.

Beginning

One of the most costly errors is a false positive.

  • The incorrect rejection of the null hypothesis.
  • Once they appear in the literature, they are persistent.
    • Because null results have many possible causes, failures to replicate previous findings are never conclusive.
    • Because it is uncommon for prestigious journals to publish null findings or exact replication, researchers have little incentive to even attempt them.
  • False positives waste resources

They inspire investment in fruitless research programs and can lead to ineffective policy changes.

Ambiguity is rampant in empirical research.

Solution

As a solution to the flexibility-ambiguity problem, there are offered six requirements for authors and four guidelines for reviewers.

This solution substantially mitigates the problem but imposes only a minimal burden on authors, reviewers, and readers.
Leaves the right and responsibility of identifying the most appropriate way to conduct research in the hands of researchers, requiring only that authors provide appropriately transparent descriptions of their methods so that reviewers and readers can make informed decisions regarding the credibility of their findings.

Requirements for authors

1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article.

2. Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data collection justification.
Samples smaller than 20 per cell are not powerful enough to detect most effects.

3. Authors must list all variables collected in a study
Prevents researchers from reporting only a convenient subset of the many measures that were collected, allowing readers and reviewers to easily identify possible researcher degrees of freedom.

4. Authors must report all experimental conditions, including failed manipulations
Prevents authors from selectively choosing only to report the condition comparisons that yield results that are consistent with their hypothesis.

5. If observations are eliminated, authors must also report what the statistical results are if those observations are included.
Makes transparent the extent to which a finding is reliant on the exclusion of observations, puts appropriate pressure on authors to justify

.....read more
Access: 
Public
Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability - summary of an article by Nosek, Spies, & Motyl, (2012)

Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability - summary of an article by Nosek, Spies, & Motyl, (2012)

Image

Critical thinking
Article: Nosek, Spies, & Motyl, (2012)
Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability


Abstract

An academic scientist’s professional success depends on publishing.

  • Publishing norms emphasize novel, positive results.
  • Disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results .
  • When incentives favor novelty over replication, false results persists in the literature unchallenged, reducing efficiency in knowledge accumulation.

This article develops strategies for improving scientific practices and knowledge accumulation that account for ordinary human motivations and biases.

A true story of what could have been

Incentives for surprising, innovative results are strong in science.

  • Science thrives by challenging prevailing assumptions and generating novel ideas and evidence that push the field in new directions.

Problem: the incentives for publishable results can be at odds with the incentives for accurate results. This produces a conflict of interest.

  • The conflict may increase the likelihood of design, analysis, and reporting decisions that inflate the proportion of false results in the published literature.

The solution requires making incentives for getting it right competitive with the incentives for getting it published.

How evaluation criteria can increase the false result rate in published science

Publishing is the ‘very heart of modern academic science, at levels ranging from the epistemic certification of scientific thought to the more personal labyrinths of job security, quality of life and self esteem’.

With an intensely competitive job marked, the demands on publication might seem to suggest a specific objective for the early-career scientists: publish as many articles as possible in the most prestigious journals that will accept them.

Some things are more publishable than others

Even if a researcher conducts studies competently, analyses the data effectively, and writes the results beautifully,

.....read more
Access: 
Public
Neyman, Pearson and hypothesis testing - summary of an article by Dienes (2003)

Neyman, Pearson and hypothesis testing - summary of an article by Dienes (2003)

Image

Critical thinking
Article: Dienes (2003)
Neyman, Pearson and hypothesis testing


Introduction

In this article, we will consider the standard logic of statistical inference.
Statistical inference: the logic underlying all the statistics you see in the professional journals of psychology and most other disciplines that regularly use statistics.

The underlying logic of statistic (Neyman-Pearson) is both highly controversial, frequently attacked (and defended) by statisticians and philosophers, and more frequently misunderstood.

Probability

The meaning of probability we choose determines what we can do with statistics.
The proper way of interpreting probability remains controversial, so there is still debate over what can be achieved with statistics.
The Neyman-Pearson approach follows form one particular interpretation of probability. The Bayesian approach considered follows form another.

Interpretations often start with a set of axioms that probabilities must follow.
Two interpretations of probability:

  • the subjective interpretation: a probability is a degree of conviction of a belief
  • the objective interpretation: locate probability in the world.

The most influential objective interpretation of probability is the long-run relative frequency interpretation. Here, probability is a relative frequency.
Because the long-run relative frequency is a property of all the events in the collective, it follows that a probability applies to a collective, not to any single event.
A single event could be a member of different collectives. So a singular event does not have a probability, only collectives do.

Objective probabilities do not apply to single cases. They also do not apply to the truth of hypotheses.
A hypothesis is simply true or false, just as a single event either occurs or does not.
A hypothesis is not a collective, it therefore does not have an objective probability.

Data and hypotheses

Data = D

Hypothesis = H

P(H|D) is the inverse of the conditional probability p(D|H). Inverting conditional probabilities makes a big difference.
P(A|B) can have a very different value from p((B|A).
If you know P(D|H) does not mean you know what p(H|D) is.
There are two reasons for this:

  • inverse conditional probabilities can have very different values
  • in any case, it is meaningless to assign an objective probability to a hypothesis.

.....read more
Access: 
Public
Evaluating Theories - summary of an article by Dennis & Kintsch

Evaluating Theories - summary of an article by Dennis & Kintsch

Image

Critical thinking
Article: Dennis & Kintsch
Evaluating Theories


Introduction

A theory is a concise statement about how we believe the world to be.
Theories organize observations of the world and allow researchers to make predictions about what will happen in the future under certain conditions.

Science is about the testing of theories, and the data we collect as scientists should either implicitly or explicitly bear on theory.

The characteristics that lead a theory to be successful from those that make it truly useful:

  • Descriptive adequacy:
    Does the theory accord with the available data?
  • Precision and interpretability:
    Is the theory described in a sufficiently precise fashion that other theorists can interpret it easily and unambiguously?
  • Coherence and consistency:
    Are there logical flaws in the theory? Does each component of the theory seem to fit with the others in to a coherent whole? Is it consistent with theory in other domains?
  • Prediction and falsifiability:
    Is the theory formulated in such a way that critical tests can be conducted that could reasonably lead to the rejection of the theory?
  • Postdiction and explanation:
    Does the theory provide a genuine explanation of existing results?
  • Parsimony:
    Is the theory as simple as possible?
  • Originality:
    Is the theory new or is it essentially a restatement of an existing theory?
  • Breadth:
    Does the theory apply to a broad range of phenomena or is it restricted to a limited domain?
  • Usability:
    Does the theory have applied implications?
  • Rationality:
    Does the theory make claims about the architecture of mind that seem reasonable in the light of the environmental contingencies that have shaped or evolutionary theory?

Criteria on which to evaluate theories

Descriptive adequacy

The extent to which it accords with data.
In psychology, the most popular way of comparing a theory against data is null hypothesis significance testing.

Determining whether a theory is consistent with data is not always as straightforward as it may at first appear.

Some of the the subtleties involved in determining the extent to which a theory accords with data

  • Using null hypothesis significance testing, it is not possible to conclude that there is no difference. A proponent of a theory that predicts a list-length effect can always propose that a failure to find the difference was a consequence of lack of power of the experimental design.
  • Null
.....read more
Access: 
Public
Degrees of falsifiability - summary of an article by Dienes (2008)

Degrees of falsifiability - summary of an article by Dienes (2008)

Image

Critical thinking
Article: Dienes (2008)
Degrees of falsifiability


Falsifiability

A potential falsifier of a theory: any potential observation that would contradict the theory.
One theory is more falsifiable than another if the class of potential falsifiers is larger.

Scientists prefer simple theories.
Simple theories are better testable.

A theory can gain in falsifiability not only by being precise, but also be being broad in range of situations to which the theory applies.
The greater the universality of a theory, the more falsifiable it is. Even if the predictions are not very precise.

Revisions to a theory may make it more falsifiable by specifying fine-grained causal mechanisms.
As long as the steps in a proposed causal pathway are testable, specifying the pathway gives you more falsifiers.

Psychologists sometimes theorize and make predictions by constructing computational models.
A computational model is a computer simulation of a subject, where the model is exposed to the same stimuli subjects receive and gives actual trial-by-trial responses.

A theory that allows everything explains nothing.
The more a theory forbids, the more it says about the world. The empirical content of a theory increases with its degree of falsifiability.

The more falsifiable a theory is, the more open it is to criticism.
So the more falsifiable our theories are, the faster we can make progress, given progress comes from criticism.

Science aims at the maximum falsifiability it can achieve: successive theories should be successively more falsifiable. Either in terms of universality or precision.

Make sure that any revision or amendment to theory can be falsified. That way theory development is guaranteed to keep its empirical character.

Observations

Observations are always ‘theory impregnated’.
Falsification is not so simple as pitting theory against observation.
Theories determine what an observation is.

Access: 
Public
Causal Inference and Developmental Psychology - summary of an article by Foster (2010)

Causal Inference and Developmental Psychology - summary of an article by Foster (2010)

Image

Critical thinking
Article: Foster (2010)

Causal Inference and Developmental Psychology
(the part needed for psychology at the UvA)

Four premises

  • Causal inference is essential to accomplishing the goals of developmental psychologists
  • In many analyses, psychologists unfortunately are attempting causal inference but doing so badly, based on many implicit and, in some cases, implausible assumptions.
  • These assumptions should be identified explicitly and checked empirically and conceptually
  • Once introduced to the broader issues, developmental psychologists will recognize the central importance of causal inference and naturally embrace the methods available.


Why causal inference?

Causal thinking and causal inference are unavoidable.

  • Even if researchers can distinguish associations from causal relationships, lay readers, journalists, policymakers, and other researchers generally cannot.
  • If a researcher resist the urge to jump form association to causality, other researchers seem willing to do so on his or her behalf.

Causal inference as the goal of developmental psychology

the lesson is not that causal relationships can never be established outside of random assignment, but that they cannot be inferred from associations alone. Some additional assumptions are required.

The goal of this research should be to make causal inference as plausible as possible.
Doing so involves applying the best methods available among a growing set of tools.

As part of the proper use of those tools, the researcher should identify the key assumptions on which they rest and their plausibility in any particular application.
The researcher should check the consistency of those assumptions as much as possible using the available data. In many instances key assumptions will remain untestable.
The plausibility of those assumptions need to be assessed in the light of substantive knowledge.

What constitutes credible or plausible is not without debate.

At this point, much of developmental psychology involves implausible causal inference.

  • Such inference could be improved even without dramatically changing the complexity of the analysis.

Two frameworks for causal inference

Two conceptual tools are especially helpful in moving from associations to causal relationships.

  • The directed acyclic graph (DAG)

This tool assists researchers in identifying the implications of a set of associations for understanding causality and the set of assumptions under which those associations imply causality
Moving from association to causality requires ruling out potential confounders: variables associated with both treatment and outcome.
The DAG is particularly useful for helping the research to

.....read more
Access: 
Public
Confounding and deconfounding: or, slaying the lurking variable - summary of an article by Pearl (2018)

Confounding and deconfounding: or, slaying the lurking variable - summary of an article by Pearl (2018)

Image

Critical thinking
Article: Pearl (2018)
Confounding and deconfounding: or, slaying the lurking variable


Introduction

Confounding bias occurs when a variable influences both who is selected for the treatment and he outcome of the experiment.
Sometimes the confounders are known. Other times they are merely suspected and act as a ‘lurking third variable’.

If we have measurements of the third variable, then it is very easy to deconfound the true and spurious effects.

Statisticians both over- and underrate the importance of adjusting for possible confounders

  • Overrate in the sense that they often control for many more variables than they need to and even for variables that they should not control for
  • Underrate in the sense that they are loath to talk about causality at all, even if the controlling has been done correctly.

The chilling fear of confounding

Knowing the set of assumptions that stand behind a given conclusion is not less valuable than attempting to circumvent those assumptions with and RCT, which has complications on its own.

The skillful interrogation of nature: why RCTs work

The one circumstance under which scientists will abandon some of their reticence to talk about causality is when they have conducted a randomized controlled trial (RCT).

Randomization brings two benefits:

  • It eliminates confounder bias
  • It enables the researcher to quantify his uncertainty

Another ways is, if you know what all the possible counfounders are, to measure and adjust for them.
But, randomization had one great advantage: it servers every incoming link to the randomized variable, including the ones we don’t know about or cannot measure.

RCTs are preferred to observational studies.
But, in some cases, intervention may be physically impossible or unethical.

Provisional causality: causality contingent upon the set of assumptions that our causal diagram advertises.

The principal objective of an RCT is to eliminate confounding.

The new paradigm of confounding

Confounding is not a statistical notion. It stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using statistical methods.
If you can’t articulate mathematically what you want to assess, you can’t expect to define what constitutes a discrepancy.

Historically, the concept of ‘confounding’ has evolved around two related conceptions:

  • Incomparability
  • A lurking third variable.

Both

.....read more
Access: 
Public
Critical thinking in Quasi-Experimentation - summary of an article by Shadish (2008)

Critical thinking in Quasi-Experimentation - summary of an article by Shadish (2008)

Image

Critical thinking
Article: Shadish (2008)
Critical thinking in Quasi-Experimentation

All experiments are about discovering the effects of causes.
All experiments have in common the deliberate manipulation of an assumed cause, followed by observation of the effects that follow.

A Quasi-experiment: an experiment that does not use random assignment conditions.


Causation

What is a cause?

An inus condition: an insufficient cause by itself. It effectiveness required it to be embedded in a larger set of conditions.

Most causal relationships are not deterministic, but only increase the probability that an effect will occur.
This is the reason why a given causal relationship will only occur under some conditions but not universally.
To different degrees, all causal relationships are contextually dependent, so the generalization of experimental effects is always at issue.

Experimental causes are manipulable.
Experiments explore the effects of things that can be manipulated.
Experimental causes must be manipulable.

In quasi-experiments, the cause is whatever was manipulated, which may include many more things than the researcher realizes were manipulated.
In quasi-experiments, especially if the researcher is not the person manipulating the treatment, it is easy to make mistaken claims about what was manipulated, and the context in which it occurred.

What is an effect?

In an experiment, we observe what did happen when people receive a treatment.
The counterfactual is knowledge of what would have happened to those same people if they simultaneously had not received treatment.

An effect is the difference between what did happen and what would have happened.

We can never observe the counterfactual.
Experiments try to create reasonable approximations to this physically impossible counterfactual.

Two central tasks in experimental design are:

  • Creating a high-quality but necessarily imperfect source of counterfactual inference
  • Understanding how this source differs form the treatment condition.

Random assignment forms a control group that is often the best approximation to this counterfactual that we can usually obtain, though even that control group is imperfect because the person in the control group are not identical to those in the treatment group.
However, we do know that participants in the treatment and control group differ form each other only randomly.

The problem in quasi-experiments is that differences between treatment and control are usually systematic, not random, so nonrandom controls may not tell us much about what would have happened to the treatment group if they had not received treatment.
Much of quasi-experimentation is concerned with creating good sources of counterfactual inference. In general, quasi-experiments use two different tools to do so

  • Observing the same unit over time
  • To try to make nonrandom control groups as similar as possible to the participants in the treatment group.
.....read more
Access: 
Public
Beyond the null ritual, formal modeling of psychological processes - summary of an article by Marewski, & Olsson, (2009)

Beyond the null ritual, formal modeling of psychological processes - summary of an article by Marewski, & Olsson, (2009)

Image

Critical thinking
Article: Marewski, & Olsson, (2009)

Beyond the null ritual, formal modeling of psychological processes


Beyond the null ritual

Rituals can be characterized by a range of attributes including:

  • Repetitions of the same action
  • Fixations on special features such as numbers
  • Anxieties about punishments for rule violations
  • Wishful thinking

Each of these characteristics is reflected in null hypothesis significance testing.

One good way to make theories more precise is to cast them as formal models.
In doing so, researchers can move beyond the problems of null hypothesis significance testing, and simple difference searching.

What is a model?

In the broadest sense, a model is a simplified representation of the world that aims to explain observed data.
A model is a formal instantiation of a theory that specifies the theory’s predictions. This category also includes statistical tools, such as structural equation or regression models.

Statistical tools are not typically meant to mirror the workings of psychological mechanisms.

What is the scope of Modeling?

Modeling is not meant to be applied equally to all research questions. Each method has its specific advantages and disadvantages.

Modeling helps researchers answer involved questions and understand complex phenomena.
In psychology, modeling is especially suited for basic and applied research about the cognitive system.

Advantages of formally specifying theories

Four closely interrelated benefits of increasing the precision of theories by casting them as models:

  • Models allow the design of strong tests of theories
  • They can also sharpen research questions
  • Models can lead beyond theories built on the general linear model
  • Modeling helps to address real-world problems

Designing strong tests of theories

Models provide the bridge between theories and empirical evidence.
They enable researchers to make competing quantitative predictions, which in turn lead to strong comparative tests of theories.

Any quantitative prediction can be systematically better or worse than any other.

But, as soon as one starts to compare quantitative predictions from different models, the use of null hypothesis testing can become inappropriate or meaningless.

Sharpening research questions

Null hypothesis tests are often used to evaluate verbal, informal theories.
But, in such theories are underspecified, then they can be

.....read more
Access: 
Public
The two disciplines of scientific psychology - summary of an article by Cronbach (1957)

The two disciplines of scientific psychology - summary of an article by Cronbach (1957)

Image

Critical thinking
Article: Cronbach (1957)
The two disciplines of scientific psychology


The separation of the disciplines

The experimental method, where the scientists changes conditions in order to observe their consequences, is much the more coherent of our two disciplines.

Correlational psychology was slower to mature.
It qualifies equally as a discipline, because it asks a distinctive type of question and has technical methods of examining whether the question has been properly put and the data properly interpreted.

The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confident statements about causation.

The correlational method can study what man has not learned to control or can never hope to control.

Characterization of the disciplines

In the beginning, experimental psychology was a substitute for purely naturalistic observation of man-in-habitat.

The experiment came to be concerned with between-treatment variance.
And, today the majority of experimenters derive their hypotheses explicitly from theoretical premises and try to nail their results into a theoretical structure.
The goal in the experimental tradition is to get differential variables out of sight.

The correlational psychologists loves those variables the experimenter left home to forget.

Factor analysis is rapidly being perfected into a rigorous method of clarifying multivariate relationships.
The correlational psychologists is a mere observer of a play where Nature pulls a thousand strings: but his multivariate methods make him equally and expert, an expert in figuring out where to look for the hidden strings.

The shape of a united discipline

It is not enough for each discipline to borrow from the other.
Correlational psychologists studies only variance among organisms; experimental psychology studies only variance among treatments.
A united discipline will study both of these, but it will also be concerned with the otherwise neglected interactions between organismic and treatment variables.
Our job is to invent constructions and to from a network of laws which permits prediction.

From observations we must infer a psychological description of the situation and of the present state of the organism.
Our laws should permit us to predict, from this description, the behaviour of organism-in-situation.

Methodologies for a joint discipline have already been proposed.

Access: 
Public
Simpson's paradox in psychological science: a practical guide - summary of an article by Kievit, Frankenhuis, Waldorp, & Borsboom (2013)

Simpson's paradox in psychological science: a practical guide - summary of an article by Kievit, Frankenhuis, Waldorp, & Borsboom (2013)

Image

Critical thinking
Article: Kievit, Frankenhuis, Waldorp, & Borsboom (2013)
Simpson's paradox in psychological science: a practical guide

Introduction

Simpson’s paradox: the direction of an association at the population-level may be reversed within the subgroups comprising that population.

Simpson showed that a statistical relation observed in a population could be reversed within all of the subgroups that make up that population.


What is Simpson’s paradox?

Simpson’s paradox is a counter-intuitive feature of aggregated data, which may arise when (causal) inferences are drawn across different explanatory levels. (like population to subgroup or subgroup to individual).

Simpson’s paradox is conceptually and analytically related to many statistical challenges and techniques.
The underlying shared theme of these techniques is that they are concerned with the nature of (causal) inference. The challenge is what inferences are warranted based on the data we observe.

Simpson’s paradox in individual differences

One can only be sure that a group-level finding generalizes to individuals when the data are ergodic, which is a very strict requirement.
Since this requirement is unlikely to hold in many data sets, extreme caution is warranted in generalizing across levels.
The dimensions that appear in a covariance structure analysis describe patterns of variation between people, not variation within individuals over time.

A person X may have a position on five dimensions compared to other people in a given population, but this does not imply that person varies along this number of dimensions over time.

Two variables may correlate positively across a population of individuals, but negatively within each individual over time.

A survival guide to Simpson’s paradox

Simpson’s paradox may occur in a wide variety of research designs, methods, and questions.
There is no single mathematical property that all instances of SP have in common. Therefore, there will not be a single, correct rule for analysing data so as to prevent cases of SP.

What we can do is consider the instances of SP we are most likely to encounter, and investigate them for characteristic warning signals.

The most general danger of psychology is that we might incorrectly infer that a finding at the level of the group generalizes to subgroups, or to individuals over time.

Preventing Simpson’s paradox

Develop and test mechanistic explanations

The first step in addressing SP is to carefully consider when it may arise.

The mechanistic inference we propose to explain the data may be incorrect.
This danger arises when we use data

.....read more
Access: 
Public
Fearing the future of empirical psychology - summary of an article by LeBel & Peters (2011)

Fearing the future of empirical psychology - summary of an article by LeBel & Peters (2011)

Image

Critical thinking
Article: LeBel & Peters (2011)
Fearing the future of empirical psychology


The interpretation bias

Because empirical data undermine theory choice, alternative explanations of data are always possible, both when the data statistically support the researcher’s hypothesis and when they fail to do so.

The interpretation bias: a bias toward interpretations of data that favour a researcher’s theory, both when the null hypothesis is statistically rejected and when not.
This bias entails that, regardless of how data turn out, the theory whose predictions are being tested is artificially buffered from falsification.
The ultimate consequence is an increased risk of reporting false positives and disregarding true negatives, and so drawing incorrect conclusions about human psychology.

The research bias underlying the file-drawer problem in no way depend on unscrupulous motives.

Conservatism in theory choice

The knowledge system that constitutes a science such as psychology can be roughly divided into two types of belief:

  • Theory-relevant beliefs
    Concern the theoretical mechanisms that produce behaviour
  • Method-relevant beliefs
    Concern the procedures through which data are produced, measured and analysed

In any empirical test of a hypothesis, interpretation of the resulting data depends on both theory-relevant and method-relevant beliefs, as both types of belief are required to bring the hypothesis to empirical test.
Consequently, the resulting data can always be interpreted as theory relevant or as method relevant.

Weaknesses in the current knowledge system of empirical psychology bias the resulting choice of interpretation in favour of the researcher’s theory.
Deficiencies in methodological research practice systematically bias

  • The interpretation of confirmatory data as theory relevant
  • The interpretation of disconfirmatory data as method relevant

This has the result that the researcher’s hypothesis is artificially buffered from falsification.

The interpretation of data should hinge not on what the pertinent beliefs are about, but rather on the centrality of those beliefs.
The centrality of belief reflects its position within the knowledge system: central beliefs are those on which many other beliefs depend. Peripheral beliefs are those with few dependent beliefs.
The rejection of central beliefs to account for observed data entails a major restructuring of the overall knowledge system.

Conservatism: choosing the theoretical explanation consistent with the data that requires the least amount of restructuring of the existing knowledge system.
Generally, the conservatism in theory choice is a virtue, as it reduces ambiguity in the interpretation of data.
The value of methodological rigour is precisely that, by leveraging conservatism, it becomes more

.....read more
Access: 
Public
The 10 commandments of helping students distinguish science from pseudoscience in psychology - summary of an article by Scott O. Lilienfeld (2005)

The 10 commandments of helping students distinguish science from pseudoscience in psychology - summary of an article by Scott O. Lilienfeld (2005)

Image

Critical thinking
Article: Scott O. Lilienfeld (2005)
The 10 commandments of helping students distinguish science from pseudoscience in psychology


The ten commandments of helping students distinguish science from pseudoscience in psychology

The first commandment

It is important to communicate to students that the differences between between science and pseudoscience, although not absolute or clear-cut, are neither arbitrary or subjective.

Warning signs that characterize most pseudoscientific disciplines:

  • A tendency to invoke ad hoc hypotheses, which can be thought of as ‘escape hatches’ or loopholes, as a means of immunizing claims from falsification.
  • An absence of self-correction and an accompanying intellectual stagnation
  • An emphasis on confirmation rather than refutation
  • A tendency to place the burden of proof on sceptics, not proponents, of claims
  • Excessive reliance on anecdotal and testimonial evidence to substantiate claims
  • Evasion of the scrutiny afforded by peer review
  • Absence of ‘connectivity’, a failure to build on existing scientific knowledge
  • Use of impressive-sounding jargon whose primary purpose is to lend claims of facade of scientific respectability
  • An absence of boundary conditions. A failure to specify the settings under which claims do not hold.

Non of these warnings signs is by itself sufficient to indicate that a discipline is pseudoscientific.
But, the more of these warning signs a discipline exhibits, the more suspect it should become.

The second commandment

Learning to distinguish scepticism from cynicism.
One danger of teaching students to distinguish science from pseudoscience is that we can inadvertently produce students who are reflexively dismissive of any claim that appears implausible.

Scepticism, which is the proper mental set of the scientist, implies two seemingly contradictory attitudes:

  • An openness to claims
  • A willingness to subject these claims to incisive scrutiny.

Cynicism implies close-mindedness.

The third commandment

Distinguish methodological scepticism from philosophical scepticism.

  • Methodological (scientific) scepticism: an approach that subjects all knowledge claims to scrutiny with the goal of sorting out true from false claims
  • Philosophical scepticism: an approach that denies the possibility of knowledge.

There is a continuum of confidence in scientific claims.

The fourth commandment

Distinguish pseudoscientific claims from claims that are merely false.
The key difference between science and pseudoscience lies not in their content but in their approach to evidence.

  • Science seeks out contradictory information and eventually incorporates such information into its corpus of knowledge
  • Pseudoscience tends to avoid contradictory information and thereby fails to foster the self-correction that is essential to scientific progress.

The fifth commandment

Distinguish science from scientists.

The scientific method is a toolbox of skills that scientists have developed to prevent themselves from confirming their own biases.

The sixth commandment

Explain

.....read more
Access: 
Public
WSRt, critical thinking, a list of terms used in the articles of block 2

WSRt, critical thinking, a list of terms used in the articles of block 2

Image

This is a list of the important terms used in the articles of block 2 of WSRt at the uva.


Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability

Accuracy motives: to learn and publish true things about human nature

Professional motives: to succeed and thrive professionally.

Neyman, Pearson and hypothesis testing

Statistical inference: the logic underlying all the statistics you see in the professional journals of psychology and most other disciplines that regularly use statistics.

The subjective interpretation of probability: a probability is a degree of conviction of a belief

The objective interpretation of probability: locate probability in the world.

Alpha: the long-term error rate for one type of error: saying the null is false when it is true.

Type I error: when the null is true and we reject it.

Type II error: accepting the null when it is false.

Meta-analysis: the process of combining groups of studies together to obtain overall tests of significance.

Evaluating Theories

Descriptive adequacy: does the theory accord with the available data?

Precision and interpretability: Is the theory described in a sufficiently precise fashion that other theorists can interpret it easily and unambiguously?

Coherence and consistency: Are there logical flaws in the theory? Does each component of the theory seem to fit with the others in to a coherent whole? Is it consistent with theory in other domains?

Prediction and falsifiability: Is the theory formulated in such a way that critical tests can be conducted that could reasonably lead to the rejection of the theory?

Postdiction and explanation: Does the theory provide a genuine explanation of existing results?

Parsimony: Is the theory as simple as possible?

Originality: Is the theory new or is it essentially a restatement of an existing theory?

Breadth: does the theory apply to a broad range of phenomena or is it restricted to a limited domain?

Usability: does the theory have applied implications?

Rationality: does the theory make claims about the architecture of mind that seem reasonable in the light of the environmental contingencies that have

.....read more
Access: 
JoHo members
Everything you need for the course WSRt of the second year of Psychology at the Uva

Everything you need for the course WSRt of the second year of Psychology at the Uva

Image

This magazine contains all the summaries you need for the course WSRt at the second year of psychology at the Uva.

Access: 
Public
Follow the author: SanneA
More contributions of WorldSupporter author: SanneA:
Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

Check how to use summaries on WorldSupporter.org


Online access to all summaries, study notes en practice exams

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check related topics:
Activities abroad, studies and working fields
Countries and regions
Institutions and organizations
Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
1811