WSRt, critical thinking - a summary of all articles needed in the second block of second year psychology at the uva
- 1853 reads
Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>
Critical thinking
Article: Dienes (2003)
Neyman, Pearson and hypothesis testing
In this article, we will consider the standard logic of statistical inference.
Statistical inference: the logic underlying all the statistics you see in the professional journals of psychology and most other disciplines that regularly use statistics.
The underlying logic of statistic (Neyman-Pearson) is both highly controversial, frequently attacked (and defended) by statisticians and philosophers, and more frequently misunderstood.
The meaning of probability we choose determines what we can do with statistics.
The proper way of interpreting probability remains controversial, so there is still debate over what can be achieved with statistics.
The Neyman-Pearson approach follows form one particular interpretation of probability. The Bayesian approach considered follows form another.
Interpretations often start with a set of axioms that probabilities must follow.
Two interpretations of probability:
The most influential objective interpretation of probability is the long-run relative frequency interpretation. Here, probability is a relative frequency.
Because the long-run relative frequency is a property of all the events in the collective, it follows that a probability applies to a collective, not to any single event.
A single event could be a member of different collectives. So a singular event does not have a probability, only collectives do.
Objective probabilities do not apply to single cases. They also do not apply to the truth of hypotheses.
A hypothesis is simply true or false, just as a single event either occurs or does not.
A hypothesis is not a collective, it therefore does not have an objective probability.
Data = D
Hypothesis = H
P(H|D) is the inverse of the conditional probability p(D|H). Inverting conditional probabilities makes a big difference.
P(A|B) can have a very different value from p((B|A).
If you know P(D|H) does not mean you know what p(H|D) is.
There are two reasons for this:
Statistics cannot tell us how much to believe a certain hypothesis. What we can do, according to Neyman and Pearson, is set up decision rules for certain behaviours such that in following those rules in the long run we will not often be wrong. We can work out what the error rates are for certain decision procedures and we can choose procedures that control the long-run error rates at acceptable levels.
Decision rules work by setting up two contrasting hypotheses.
For a given experiment we can calculate p=(getting t as extreme or more extreme than obtained| H0).
If p is less than alpha, the level of significance we have decided in advance, we reject H0. By following this rule, we know in the long run that when H0 is actually true, we will conclude it false only 5% of the time.
In this procedure, p-value has not meaning in itself. It is just a convenient mechanical procedure for accepting or rejecting a hypothesis.
Alpha is an objective probability, a relative long-run frequency.
It is the proportion of errors of a certain type we will make in the long run, if we follow the above procedure and the null hypothesis is in fact true.
Neither alpha nor our calculated p tells us how probable the null hypothesis is.
Alpha: the long-term error rate for one type of error: saying the null is false when it is true.
There are two ways of making an error with the decision procedure.
Both alpha and beta should be controlled at acceptable levels.
Sometimes significance or alpha is defined simply as ‘the probability of a Type I error’. This is wrong.
Alpha is specifically the probability (long-run frequency) of a Type I error when the null hypothesis is true.
Strictly using a significance level of 5% does not guarantee that only 5% of all published significance results are in error.
Controlling for alpha does not mean you have controlled for beta.
Power is 1 – ß
Power is the probability of detecting an effect, given an effect really exists in the population.
In order to control ß, you need to:
The more participants your run, the greater the power.
Studies should systematically use power calculations to determine the number of participants.
Significance of 5% means that, if the null hypothesis were true, one would expect 5% of studies to be significant.
Meta-analysis: the process of combining groups of studies together to obtain overall tests of significance.
A set of null results does not mean you should accept the null. They may indicate that you should reject the null.
If your study has low power, getting a null result tells you nothing in itself.
You would expect a null result whether or not the null hypothesis was true.
In the Neyman-Pearson approach, you set power at a high level in designing the experiment, before you run it. Then you are entitled to accept the null hypothesis when you obtain a null result. Doing this procedure you will make errors at a small controlled rate, a rate you have decided in advance is acceptable for you.
Statistics never allows absolute proof or disproof.
Sensitivity can be determined in three ways:
Whenever you find a null result and it is interesting to you that the result is null, you should always indicate the sensitivity of your analysis.
The conditions under which you will stop collecting data for a study define the stopping rule you use.
In the Neyman-Pearson approach it is essential to know the collective or reference class for which we are calculating our objective probabilities alpha and beta.
The relevant collective is defined by a testing procedure applied an indefinite number of times.
In the Neyman-Pearson approach, in order to control overall Type I error, if we perform a number of tests we need to test each one at a stricter level of significance in order to keep overall alpha at 0.05. There are numerous corrections.
A researcher might mainly want to look at one particular comparison, but threw in some other conditions out of curiosity while going to the effort of recruiting, running and paying participants. Then, it might feel unfair that the p level is to high just because you collected other conditions you didn’t need to have.
The solution is that if you planned one particular comparison in advance then you can test at the 0.05 level, because that one was picked out in advance of seeing the data.
But, the other tests must involve a correction.
Alpha is an objective probability and hence a property of a collective and not any individual event, not a particular sample.
In the Neyman-Pearson approach, the relevant probabilities alpha and beta are the long-run error rates you decide are acceptable and so must be set in advance.
If alpha is set at 0.05, the only meaningful claim to make about the p-value of a particular experiment is either it is less than 0.05 or not.
The statistics tell you nothing about how confident you should be in a hypothesis nor what strength of evidence there is for different hypotheses.
It is hard to construct an argument for why p-values should be taken as strength of evidence per se. Conceptually, the strength of evidence for or against a hypothesis is distinct from the probability of obtaining such evidence.
There is not need to force p-values into taking the role of measuring strength of evidence, a role for which they may often give a reasonable answer, but not always.
Significance is not a property of populations.
Hypotheses are about population properties. Significance is not a property of population means or differences.
Decision rules are laid down before data are collected; we simply make black and white decisions with known risks of error.
A more significant result does not mean a more important result, or a larger effect size.
The Neyman-Pearson approach is not just about null hypothesis testing.
Neyman also developed the concept of confidence interval, a set of possible population values the data are consistent with.
Instead of saying merely we reject one value, one reports the set of values rejected, and the set of possible values remaining.
To calculate the 95% confidence interval, find the set of all values of the dependent variable that are non-significantly different from your sample value at the 5% level.
Use of the confidence interval overcome some of the problems people have when using Neyman-Pearson statistics otherwise:
Confidence intervals are a very useful way of summarizing what a set of studies as a whole are telling us. You an calculate the confidence intervals on the parameter of interest by combining the information provided in all the studies.
The 95% confidence interval is interpreted in terms of an objective probability.
The procedure of calculating 95% confidence intervals will produce intervals that include the true population value 95% of the time.
There is no probability attached to any one calculated interval. That interval either includes the population in value or it does not.
There is not a 95% probability that the 95% confidence limits for a particular sample includes the true population mean. But if you acted as if the true population value is included in your interval each time you calculate a 95% confidence interval, you would be right 95% of the time.
Inference consists of simple acceptance or rejection
Null hypothesis testing encourages weak theorizing
In the Neyman-Pearson approach it is important to know the reference class, we must known that endless series of trials might have happened but never did.
If the article uses significance or hypothesis tests, then two hypotheses need to be specified for each tests.
Most papers fall down at the first hurdle because the alternative is not well specified.
The stopping rule should be specified in advantage at a fixed number and significance testing took place once at the end of data collection.
Even if minimally interesting effect sizes were not stated in advance and if power were not stated in advance, a crucial point is how the authors dealt with interesting null results.
Given a null result was obtained, did the authors give some measure of sensitivity of the test?
This is a summary of the articles and reading materials that are needed for the second block in the course WSR-t. This course is given to second year psychology students at the Uva. This block is about analysing and evaluating psychological research. The order in which the
...There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Main summaries home pages:
Main study fields:
Business organization and economics, Communication & Marketing, Education & Pedagogic Sciences, International Relations and Politics, IT and Technology, Law & Administration, Medicine & Health Care, Nature & Environmental Sciences, Psychology and behavioral sciences, Science and academic Research, Society & Culture, Tourisme & Sports
Main study fields NL:
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
5079 | 1 |
Original reference Per contributed on 18-02-2021 10:33
Great summary - can't find the original study - can you provide the full reference please. Thx.
Add new contribution