Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article


One of the most used rituals in science is that of the null hypothesis: testing the hypothesis versus chance of chance. Although it is known to be problematic, it is often used in practice. One way to resist the temptation of using the null hypothesis is to make the theories more precise by transforming them into formal models. These can be tested against each other instead of against chance, which in turn enables the researcher to decide between competing theories based on quantitative measures. 

The randomness of the .05 alpha level gives the writer flexibility in interpreting a p-value as an indication of proof against the null hypothesis. This article is about overcoming a ritual involved in testing hypotheses in psychology: the null ritual; or the null hypothesis significance testing.  

This means that a non-specific hypothesis is tested against "chance", or it says that "there is no difference between two population means." 40 years ago, editors of major psychological journals required this ritual to be carried out in order for the paper to be published. Although methodological evidence nowadays contradicts this, the .05 alpha level is still used.

What lies beyond the zero ritual?

Rituals have a number of attributes that all apply to the null hypothesis: A repetition of the same action, a focus on the 5% (or 1%) level, fear of sanctions from journal editors and wishful thinking about the results. In its most extreme form, the zero ritual reads as follows:

  1. Set up a statistical null hypothesis with "no mean difference" or "zero correlation". Do not specify the predictions of the research hypothesis or of alternative hypotheses.

  2. Use 5% as a convention for the rejecting the null. If it is significant, accept the research hypothesis.

  3. Always perform this procedure.

Since this ritual became institutionalized in psychology, several alternatives have been proposed to replace or supplement it. The most of these suggestions focus on the way the data is analyzed. Think about effect size measures, confidence intervals, meta-analysis an resample methods. 

How is it possible that, despite attempts to introduce alternative ways, the null hypothesis is still being used, and the most? This may be due to the fact that most psychological theories are simply too weak to be able to do more than make predictions about directions of an effect. Therefore, we cannot offer an alternative form to the null hypothesis in this article, but a way to make theory more precise and to "make" it a formal model.

What is a model?

A model is a simplified representation of the world that is used to explain observed data: countless verbal and informal explanations of psychological phenomena. In a limited sense: a model is a formal instantiation of a theory that specifies the predictions.

What is the scope of modeling?

Modeling is not meant to always be applied in the same way. It must be seen as a tailor-made tool for specific problems. Modeling helps researchers to understand complex phenomena. Each method has its specific advantages and disadvantages of null hypothesis testing. Although it is also often used in other areas, it is often used in psychology for research on cognitive systems. Modeling is a complex undertaking that requires a lot of skills and knowledge.

What are the advantages of formally specifying theories?

There are four benefits of increasing precision of theories by casting them as models. 

1. A model provides a design that has strong theory tests

Models provide the bridge between theories and empirical evidence. They enable scientists to make competitive quantitative predictions that lead to strong comparative testing of theories. Making comparative testing of theories more precise ultimately leads to better systematic quantitative predictions between the theories tested. By comparing quantitative predictions with different models, the use of the null hypothesis testing can become unnecessary and useless.

2. A model can sharpen a research question

Null hypothesis are often used to test verbal, informal theories. But if such theories are not specified, they can be used post-hoc to "explain" every possible observed empirical pattern. Formal quantitative predictions are not easy to understand due to intuitive reasoning. The predictions that a model makes can only be understood by performing computer simulations. In summary, often it is only through self-modeling that someone understands what a theory actually predicts and what it cannot justify. The goal of modeling is not only to find out which of competing explanations for data is preferred, but also to sharpen the questions to be asked. 

3. A model can lead to the passing of theories that have arisen from the general linear model

Many null hypothesis significance tests only apply to simple hypothesis, for instance about linear addition effects. Scientists use available tools such as ANOVA and transform it into a psychological explanation for certain data. A prominent example is the attribution theory, which assumes that just as experimenters use ANOVA's to infer causal relations between two variables, outside the lab people infer causal relations by unconsciously doing the same calculation. But this might not be the best starting point for building a theory. Although general linear model (of which ANOVA is a way of) is a precise methodological tool, it is not always the best way to make statements or to base a theory on it.

4. A model helps to approach real-world problems

Just as the general linear model and a null hypothesis test are often inadequate when it comes to conceptualization and evaluation of a theory, factor designs can lead to testing theories under conditions that often have little to do with the real world, where the explanatory power of theories should be approved A lack of external validity can be one of the reasons why psychological outcomes make little contribution outside the lab in the real world: no person can randomly choose who they are in contact with, and no organism can "separate" correlations between life-sustaining information. But modeling, on the other hand, gives researchers the freedom to deal with natural confounders without destroying them: they can be built into the models. Modeling provides ways to increase the precision of theories. It helps researchers to quantify explanatory power. It ensures that they are not dependent on the null hypothesis. Formal statement can be linear and non-linear. By looking beyond factor designs, the possibility is created to approach real-world problems.

What are more benefits of formal modeling: an example of a modeling framework?

ACT-R is a wide, quantitative theory of human behavior that covers almost the entire human cognitive field.

Meta-analysis can be used to show that relying on significant tests slows the growth of cumulative knowledge. ACT-R, on the other hand, is a good example of how knowledge can systematically accumulate over time. ACT-R has its roots in old psychological theories, but in the end it knew its current form. This made it known that cognitive systems can give rise to adaptive processes by being transformed into static structures of the environment.

ACT-R models are specific enough for computer modeling for outcome and processes. For example, in a 2-alternative situation, reading this article or reading another article, an ACT-R model would predict which alternative would be chosen and what different reasons the model would consider before making this choice. Scientists can make the following predictions with ACT-R: (1) open behavior (2) temporal aspects of behavior (3) the associated patterns of an activity in the brain measured by fMRI scan.

In summary, modeling can promote the growth of cumulative knowledge, reveal how different behavioral activities are distributed and it can help to integrate psychological disciplines.

How do you select between competing formal models?

The comparison between alternative models is called model selection. There are a number of criteria for model selection: (1) psychological plausibility (2) falsifiability (3) number of assumptions that a model makes (4) whether a model is consistent with overarching theories (5) practical contribution. In practice, the criterion: descriptive adequacy is often used. This means that if 2 or more models are compared, the model that shows the least difference with existing data or the best fit is chosen.

A null hypothesis test is not a good way to choose between two models: if it has enough power, the test gives a significant result. But the biggest limitation of the model selection procedure based on significance or goodness-of-fit (R2) is that, on its own, these procedures do not approach the fundamental problem (choosing between two competing theories): overfitting.

What is the problem of overfitting?

To conclude that a model is better than the other using goodness-of-fit is reasonable if psychological measurements are noise-free. Unfortunately, noise-free data are practically impossible to obtain. As a result, there is an overfitting of the data: it not only captures the variance resulting from the cognitive process, but also that of a random error. Increased complexity causes a model to become overfit, thereby reducing generalizability. The generalizability is the degree to which the model is capable of predicting all potential samples generated by the same cognitive process, rather than to fit only a particular sample of existing data. The degree to which a model is susceptible to overfitting, is related to the model's complexity; the flexibility that enables it to fit certain patterns of data. 

At the same time, the generalizability of a model can positively increase the complexity of a model - but only to the point where the model is complex enough to include the systematic variations of the data. If that point is exceeded, it reduces generalizability because the model randomly absorbs variations in the data. A good fit does not necessarily ensure good generalizability for new data.

How do you select between models?

Practical

This approach relies on the intuition that when comparing models, one should choose the model that can best predict unobservable data. This can be done by testing the validity of a test (often the cross-validity test is done). A limitation of this test form is that it is not consistent. Another way to deal with this problem is to dose as many free parameters as possible. This can be done by fixing it or by creating simple models with few or no free parameters.

Simulation

By simulating the predictions of a competing model, one can gain insight into a specific behavior of a model. The results can be used to design the task to maximize the discriminatory power between models.

Theoretical

In this approach, the goodness-of-fit measurement is combined with theoretical estimates of model complexity that result in an estimate of generalizability. Generalizability (= goodness or fit + complexity) is based on the maximum log likelihood as a goodness or fit index. The complexity measurement takes different forms for different generalization measurements. The most commonly used approaches are: AIC and BIC. These are only sensitive to a kind of complexity: the number of parameters.

How do you choose between model selection approaches?

We cannot answer the question which is better, but we recommend to note the results of as many selection criteria as possible and to discuss the suitability of each criterion.

What are other pitfalls of model selection?

There are also other complications that can arise when you design and test models. If specification is the greatest virtue of modeling, it can also be the greatest curse. One must choose how a bridge can be made between informal verbal descriptions and formal implementations. This can lead to unintended discrepancies between theories and various formal counterparts. This is known s the irrelevant specification problem. 

A second problem that can arise in complex models is the Bononi paradox: when models become more complete and realistic they become less understandable and more opaque. 

Third, there can also be an identification problem. That for any behavior that exists there is a universe of different models that are all capable of explaining and reproducing the behavior. There are also an infinite number of vague and informal theories going around for which nobody will ever be able to decide whether one is better than another. 

Conclusion

Although modeling seems better from a scientific point of view, few use this approach to test. This is because it requires a lot of effort, time and knowledge. Accepting the null hypothesis test in laboratory settings leads to a reduction in the incentive for scientists to design models that explain issues in the "real" world. However, there is not often a better alternative to test and make models, because informal theories are not specific enough and they are tested against the chance of chance. With some knowledge and training, modeling can be performed with little effort.

Access: 
Public
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Image

Click & Go to more related summaries or chapters:

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Access: 
Public

Article summaries of Scientific & Statistical Reasoning - UvA

Summaries with the mandatory articles for Scientific & Statistical Reasoning at the University of Amsterdam, 2020-2021

Access: 
Public
Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Check how to use summaries on WorldSupporter.org


Online access to all summaries, study notes en practice exams

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the menu above every page to go to one of the main starting pages
  3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
  5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study (main tags and taxonomy terms)

Field of study

Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
780 1