Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

The application of statistics to science is not a neutral act. Textbook writers in the social sceiences have transformed rivaling statistical systems into an apparently monolithic method that could be used mechaniscaly. No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.

If statisticans agree on one thing it is that scientific inference should not be made mechanically. Good science requires both statistical tools and informed judgment about what model to construct, what hypotheses to test, and what tools to use. Many social scientist vote with their feet against an informed use of inferential statistics. A majority still computes p values, confindence intervals and a few calculate Bayes factors. Determining significance has become a surrogate for good research. This article is about the idol of a universal method of statistical inference.

Mindless statistical inference

In an internet study they asked participants if they felt a difference between heroism and altruism? The far majority felt so, and the authors computed a chi-squared test to find out whether the two numbers differed significantly. This is an illustration of the automatic use of statistical procedures, even when a statistical procedure really doesn't fit into the question. The idol of an automatic, universal method of inference, however, is not unique to p values or confidence intervals. It can also invade the Bayesian statistics.

The Idol of Universal Method of Inference

In this article they make three points:

  1. There is no universal method of scientific inference, but, rather a toolbox of useful statistical methods. In the absence of a universal method, its followers worship surrogate idols, such as significant p values. The gap between the ideal and its surrogate rested on the wrong ideas people have regarding statistical inference. For instance that a p value of 1% indicates that there is a 99% chance of replication.
  2. If the proclaimed 'Bayesian revolution' were to take place, the danger is that the idol of a universal method might survive in a new guise, proclaiming that all uncertainty can be reduced to subjective probabilities.
  3. Statistical methods are not simply applied to a discipline; they change the discipline itself and vice versa.

In science and everyday life, statistical methods have changed whatever they touched. The most dramatic change brought about by statistics was the 'probabilistic revolution'. In the natural science, the term statistical began to refer to the nature of theories, not the evaluation of data.

How statistics changed theories: the probabilistic revolution

The probabilistic revolution upset the ideal of determinism shared by most European thinkers. It differs from other revolutions because it didn't replace any systems in its own field. But it did upset theories in other fields outside of mathemetics. The social sciences inspired the probabilistic revolution in physics. But the social and medical sciences were reluctant to abandon the ideal of simple, deterministic causes. The social theorists hesitated to think of probability as more than an error term in the equation observation = true value + error.

The term inference revolution refers to a change in scientific method that was instutionalized in psychology and in other social sciences. The qualifier inference indicates that the inference of a sample to population grew to be considered the most crucial part of research.

To understand how deeply the inference revolution changed the social sciences, it is helpful to realize that routine statistical tests, such as calculations of p values or other inferential statistics, are not common in the natural sciences.

The first known test of a null hypothesis was by Arbuthnott and is strikingly similar to the 'null ritual' that was instutionalized in the social sciences. He observed that the external accidents which males are subject do make a great havock of them, and that this loss exceeds far that of the other Sex. The first null hypothesis test impressed no one, but this does not say that statistical methods have played no role in the social sciences. To summarize, statistical inference played little role and Bayesian inference virtually none in research before roughly 1940. Automatic inference was unknown before the inferental revolution with the exception of the use of critical ratio (the ratio of the obtained difference to its standard deviation).

The Null Ritual

The most prominent creation of a seemingly universal inference method is the null ritual:

  1. Set up a null hypothesis of 'no mean differences' or 'zero correlation'. Do not specify the predictions of your own research hypothesis.
  2. Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis. Report the result as p<.05>p<.01>, p.001, whichever comes next to the obtained p value.
  3. Always perform this procedure.

In psychology, this ritual became institutionalized in currricula, editorials and professional associations. But the null ritual does not exist in statistics proper. Also the null ritual is often confused with the Fisher's thoery of null hypothesis testing. For example, it has become common to use the term NHST (null hypothesis significance testing) without distinguishing between the two. But contrary to what is suggested by that misleading term, level of significance has three meanings: (a) a mere convention, (b) the alpha level, or (c) the exact level of significance.

The three meanings of significance

The alpha level is the long-term relative frequency of mistakenly rejecting the hypothesis H1 if it is true, also known as the Type 1 error rate. The beta level is the long-term relative frequency of mistakenly rejecting hypothesis H2 if it is true, also known as the type 2 error rate or power- 1.

  1. Set up two statistical hypotheses, H1 and H2 and decide on the alpha, beta and sample size before the experiment.
  2. If the data falls into the rejection region of H1, accept H2; otherwise accept H1.
  3. The usefulness of this procedure is limited among other situations were there is a conjunction of hypothese, where there is repeated sampling.

Fisher eventually refined his earlier position. The result was that a third definition of level of significance, alongside convention and alpha level.

  1. Set up a statistcal null hypothesis. The null need not be a nil hypothesis.
  2. Report the exact level of significance, do not use a conventional 5% level all the time.
  3. Use this procedure only if you know little about the problem at hand.

The procedure of Fisher differs fundamentally from the null ritual. First, one should not automatically use the same level of significance, and second, one should not use this procedure for all problems. Step one of the ritual does contain the misinterpretation that null means 'nil' such as zero difference.

The problem of conflicting methods

When writers learned about Neyman-Pearson these writers had a problem; how should they deal with conflicting methods? The solution would have been to present a sort of toolbox of different approaches, but Guilford and Nunnally mixed the concepts and presented the muddle as a single, universal method. The idol of this universal method also left no place for Bayesian statistics.

Bayesianism and the new quest for a universal method

Fisher, Neyman and Pearson also have been victims of social scientists' desire for a single tool, a desire that produced a surrogate number for inferring what is good research. The potential danger of the Bayesian statistics lies in the subjective interpretation of probability, which sanctions its universal application to all situations of uncertainty.

The 'Bayesian revolution' had a slow start. To begin with, Bayes' paper was eventually published, but it was largely ignored by all scientists. Just as the null ritual had replaced the three interpretations of level of significance with one, the currently dominant version of Bayesianism does the same with the Bayesian pluralism, promoting a universal subjective interpretation instead. Probability was:

  1. a relative frequency in the long run, such as in mortality tables used for calculating insurance premiums
  2. a propensity, that is, the physical design of an object, such as that of a dice or a billiard table
  3. a reasonable degree of subjective belief, such as in the attempts of courts to quantify the reliability of witness testimony.

In the essay of Bayes, his notion of probability is ambiguous and can be read in all three ways. With this ambiguity, however, is typical for his time in which the classical theory of probability reigned.

If probability is thought of as a relative frequency in the long run, it immediately becomes clear that Bayes' rule has a limited range of applications. Knight (economist) used the term risk for these two situations (i.e. probabilities that can be reliable measured in terms of frequency or propensity) as opposed to uncertainty. Subjective probability can be applied to situations of uncertainty and to singular events, such as the probability that Michael Jackson is still alive. There is now a new generation of Bayesians who believe that Bayesianism is the only game in town. They use the term Universal Bayes for the view that all uncertainties can or should be represented by subjective probabilities, that explicitly rejects the idea of Knight regarding the distinction between risk and uncertainty.

Risk versus uncertainty

What the universal Bayesians do not seem to realize is that in a theory Bayesianism can be optimal in a world of risk, but is of uncertain value when not all information is known or can be known or when probabilities have to be estiated form small, unreliable samples. One can also use plain common sense to see that complex optimization algorithms are unreliable in an uncertain world.

The automatic Bayes

As with the null ritual, the universal claim for Bayes' rule tends to go together with the automatic use. One version of the automatic Bayes has to do with the interpretation of the Bayes factors using the Jeffrey's scale. A second version of Automatic Bayes can be found in the heuristic-and-biases research program, that is widely taught in business education courses. But, in short, the automatic use of Bayes'rule is a dangerously beautiful idol. But Bayesianism is not reality, Bayesianism can't exist in the singular.

The statistical toolbox

The view of this article states that an alternative to these approaches is to think of the Universal and Automatic Bayes as forming a part of a larger toolbox. In this toolbox, the Bayes' rules has its value, but like any other tool, does not work for all problems.

How to change statistics?

Leibniz had a dream: to discover the calculus that could map all ideas into symbols. Such a universal calculus would also put an end to all scholarly bickering. But, nonetheless, this dream of Leibniz is still alive in social sciences today. The idea of surrogate science; from the mindless calculation of p values or Bayes factors to citation counts, is not entirely worthless. It fuels a steady stream of work of average quality and keeps researchers busy producing more of the same. But it also makes it harder for scientists to be innovative, risk taking and imaginative. Therefor surrogates also encourage cheating and incomplete or dishonest reporting. Would a Bayesian revolution lead to a better world? The answer depends on what the revolution might be. The real challenge here is to prevent the surrogates from taking over once again, such as when replacing routine significance tests with routine interpretations of the Bayes factors. So, Leibniz's beautiful dream of a universal calculus could easily turn into Bayes' nightmare.

Access: 
Public

Image

Click & Go to more related summaries or chapters:

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Access: 
Public

Article summaries of Scientific & Statistical Reasoning - UvA

Summaries with the mandatory articles for Scientific & Statistical Reasoning at the University of Amsterdam, 2020-2021

Access: 
Public
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
791 1