Summary | What is reliability? - Chapter 5 | Samenvatting WorldSupporter

Join Log in Profile Search

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

What is reliability? - Chapter 5

What is reliability?
What are the four types of reliability?
How is the size of the measurement error expressed?
What is the 'Domain Sampling Theory'?

What is reliability?

Chapter 5 is about the reliability of a test. Reliability is the extent to which differences in the observed scores of the respondent concerned correspond with differences in his or her true scores. The smaller the difference, the more reliable.

According to the Classic Test Theory (CTT) the reliability can be determined on the basis of observed scores (Xo), true scores (Xt) and random scores (Xe). Random scores are also called measurement errors.

Other factors that cause differences between the observed and the true scores are called sources of error. These cause measurement errors, which create a contradiction between the observed and the true scores.

In addition to "sources of error", there are also temporary or transient factors that can influence the observed scores. Examples of this are the number of hours of sleep, emotional state, physical condition, gambling or misplaced answers. The latter means that if you know the correct answer, you still indicate the wrong answer. These temporary/transient factors decrease or increase the observed scores versus the reliable scores.

To find out whether the observed scores are a function of measurement errors or a function of reliable scores, two questions must be asked:

Which part of the observed scores is a function of reliable inter-individual or intra-individual differences?
Which part of the observed scores is a function of measurement errors?

In other words: Xo = Xt + Xe. You can say that the observed scores are determined by the true scores and the measurement errors. The smaller the value of Xe, the better. It seems that the measurement errors are random (at random), this means that they are independent of the true scores Xt. In other words, a measurement error affects both someone with a high true score and someone with a low true score in the same way and with the same amount. There are two characteristics:

The average of all measurement errors within a test is zero.
Measurement errors do not correlate with true scores, rte = 0.

Instead of saying that reliability depends on the consistency between differences in observed scores and differences in true scores, you can also say: reliability depends on the relationships between the variability of the observed score, variability of the true score, and variability of the measurement error score.

Error score variance: Se² = ∑ (Xe minus average Xe) ² / N. The higher Se², the worse the measurement.
True score variance: St² = ∑ (Xt minus average Xt) ² / N
Observed score variance: So² = ∑ (Xo minus average Xo) ² / N. Or, So² = St² + Se².

This formula should actually be: So² = St² + Se² + 2rte * St * Se.

However, the true scores and the measurement errors are not correlated and therefore rte * st * se = 0. So there remains: So² = St² + Se².

What are the four types of reliability?

1. Reliability in terms of "proportions of variances"

Rxx (reliability coefficient) = St² / So²

Rxx = 0 means that everyone has the same true score. (St² = 0)

Rxx = 1 means that the variance of the true scores is equal to the variance of the observed scores. In other words: there are no measurement errors!

Here is an example of interpretation of Rxx:

Rxx = 0.48 or 48% of the differences in the observed scores can be attributed to the true scores. On the other hand, 1-0.48 = 0.52, so 52% of the differences can be attributed to measurement errors.

2. Reliability in terms of "lack of measurement error"

Rxx (reliability coefficient) = St² / So²

So² = St² + Se² (and therefore also: St² = So² - Se²)

Rxx = (So² - Se²) / So² = (So² / So²) - (Se² / So²)

In other words: Rxx = 1 - (Se² / So²): when (Se² / So²) is small, the reliability is high.

3. Reliability in terms of "correlations"

Rxx = Rot², where Rot² is the squared correlation between the observed scores and the true scores.

Rot = St² / (So * St) = Rot = St / So

Rot² = St² / So².

A reliability of 1.0 indicates that the differences between the observed test scores perfectly match the differences between the true scores. A reliability of 0.0 indicates that the differences between the observed scores and the true scores are totally contradictory.

4. Reliability in terms of "lack of correlation"

Rxx = 1 - Roe², where Roe² is the squared correlation between the observed scores and the error scores.

Roe = Se² / (So * Se) = Se / So

Roe² = Se² / So² so:

Rxx = 1 - Roe² = 1 - (Se² / So²).

If Roe = 0, then Rxx = 1.0

The greater the correlation between the observed scores and the error scores, the smaller Rxx. So reliability will be relatively high if the observed scores have a low correlation with the error scores.

How is the size of the measurement error expressed?

Although reliability is an important psychometric construct, it does not give a direct reflection of the magnitude of the measurement error of a test. Additional coefficients are therefore needed at this point. The standard measurement error displays the average size of the error scores. The greater the standard measurement error, the greater the average difference between observed scores and true scores, and therefore the lesser the reliability of the test.

Standard measurement error = sem

sem = So * √ (1 - Rxx)

If Rxx = 1 then Sem = 0, so: Rxx greater means sem smaller.
sem is never greater than So, so: greater means sem means greater
How is the theory of reliability translated into practice?

The theory of reliability is based on three terms: true scores, observed scores, and error scores. But in practice we do not know whether a score is actually the true score of an individual. We also do not know to what extent measurement errors influence the response of an individual. How then do we translate the theory of reliability into practice?

Although we cannot determine with certainty what the reliability or standard measurement error of a test is, advanced methods have been developed to estimate it. Examples of such techniques are giving two versions of the test, doing the same test twice and so on. In this section, four methods are discussed to estimate the reliability and standard measurement error of a test:

Parallel testing;
the tau equivalent test model;
essentially tau equivalent test model;
congeneric test model.

Each model offers a perspective on how two or more tests are the same.

1. Parallel tests

We speak of parallel tests when two (or more) tests, in addition to the basic assumptions of classical test theory, meet the following three assumptions:

The two tests have the same error variance (se12 = se22).
The intercept between the true scores on both tests is 0 (so a = 0, in Xt2 = a + b (Xt1)).
The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).

These assumptions have six implications:

This implies that the true scores of test 1 are identical to the true scores of test 2 (Xt1 = Xt2).
Derived from this it means that in case the true score of each participant on test 1 is equal to the true score on test 2, the two sets of true scores correlate perfectly with each other (rt1t2 = 1).
The variances of the value scores of tests 1 and 2 are identical (st12 = st22).
The average of the true scores of test 1 is equal to the average of the true scores of test 2.
The variance of the observed scores of test 1 is equal to the variance of the observed scores of test 2. And finally, sixth, the reliability of the tests is the same (R11 = R22).
When the scores of two tests meet all these assumptions and implications, we speak of parallel tests.

Finally, according to the KTT, there is one further implication that follows from the above: the correlation between parallel tests equals reliability. In formula form: r0102 = R11 = R22. In other words, when two tests are actually (perfectly) parallel, the correlation between the two tests is therefore equal to the reliability of both tests.

The correlation between parallel tests can also be calculated based on the variances of the true and observed scores: r0102 = st2 / so2.

2. The tau-equivalent test model

In addition to the standard assumptions of classical test theory, the tau-equivalent test model is based on the following two assumptions:

The intercept between the true scores of both tests is 0 (so a = 0, in Xt2 = a + b (Xt1)).
The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).

These two assumptions are the same as the assumptions of parallel tests. The difference lies in the first additional assumption: the tau-equivalent test model does not state the assumption of appropriate error variances. This leads to four implications (the first four that we discussed in parallel testing).

The less strict assumptions mean that the correlation between tau-equivalent tests is not a valid estimate of the reliability. This is in contrast to parallel tests, where the correlation between the tests is therefore a valid estimate of the reliability.

3. The essentially tau-equivalent test model

In addition to the standard assumptions of classical test theory, the essentially tau-equivalent test model is based on one additional assumption:

The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).
This leads to two implications (the first two discussed in parallel tests), or: rt1t2 = 1 (the correlation between the tests is perfect), and st12 = st22 (the variances of the true scores of both tests are equal).

4. Congeneric test model

The last model is the congeneric test model. According to this model, only the assumptions of classical test theory are accepted. This results in a single implication, namely that the correlation of the true scores between the tests is equal: rt1t2 = 1. This model is therefore the most strict and the most general model. Although this model is more often applicable (this model is conditional for more districted models), it offers limited possibilities for estimating reliability.

What is the 'Domain Sampling Theory'?

According to this theory, reliability is the average size of the correlations between all possible pairs of tests with N items selected from an area ("domain") of test items. The logic of this theory is the foundation of the generalizability theory, this will be discussed extensively in chapter thirteen.

Access:

Public

Check more of this topic?

Statistics and Data analysis Methods

Search other summaries?

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Click & Go to more related summaries or chapters:

Summary of Psychometrics: An Introduction by Furr - 3rd edition

Summaries per chapter with the 3rd edition of Psychometrics: An Introduction by Furr

Please note: for more summaries and study assistance with more and more recent editions of the book, you can check:

Study Guide for summaries with Psychometrics: An introduction by Furr

Summaries and supporting content:

What is psychometrics? - Chapter 1

What is important when assigning numbers to psychological constructs? - Chapter 2

What are variability and covariability? - Chapter 3

What is dimensionality and what is factor analysis? - Chapter 4

What is reliability? - Chapter 5

How to empirically estimate the reliability? - Chapter 6

What is the importance of reliability? - Chapter 7

What is validity? - Chapter 8

How to evaluate evidence for convergent and divergent validity? - Chapter 9

What types of response bias are there? - Chapter 10

What types of test bias are there? - Chapter 11

What is a confirmatory factor analysis? - Chapter 12

What is the generalizability theory? - Chapter 13

What is the Item Response Theory (IRT) and which models are there? - Chapter 14

Access:

Public

Follow the author: Psychology Supporter

Psychology Supporter

More contributions of WorldSupporter author: Psychology Supporter:

Summaries per article with Clinical Psychology at Leiden University 22/23

Summaries per article with Clinical Child and Adolescent Psychology at Leiden University 23/24

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why would you use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why would you use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the menu above every page to go to one of the main starting pages
- Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
Use the topics and taxonomy terms
- The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
Check or follow your (study) organizations:
- by checking or using your study organizations you are likely to discover all relevant study materials.
- this option is only available trough partner organizations
Check or follow authors or other WorldSupporters
- by following individual users, authors you are likely to discover more relevant study materials.
Use the Search tools
- 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
- The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:

Activity abroad, study field of working area:

Statistics and Data analysis Methods

Statistics

1146