Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Image

What is reliability? - Chapter 5

What is reliability?

Chapter 5 is about the reliability of a test. Reliability is the extent to which differences in the observed scores of the respondent concerned correspond with differences in his or her true scores. The smaller the difference, the more reliable.

According to the Classic Test Theory (CTT) the reliability can be determined on the basis of observed scores (Xo), true scores (Xt) and random scores (Xe). Random scores are also called measurement errors.

Other factors that cause differences between the observed and the true scores are called sources of error. These cause measurement errors, which create a contradiction between the observed and the true scores.

In addition to "sources of error", there are also temporary or transient factors that can influence the observed scores. Examples of this are the number of hours of sleep, emotional state, physical condition, gambling or misplaced answers. The latter means that if you know the correct answer, you still indicate the wrong answer. These temporary/transient factors decrease or increase the observed scores versus the reliable scores.

To find out whether the observed scores are a function of measurement errors or a function of reliable scores, two questions must be asked:

  1. Which part of the observed scores is a function of reliable inter-individual or intra-individual differences?
  2. Which part of the observed scores is a function of measurement errors?

In other words: Xo = Xt + Xe. You can say that the observed scores are determined by the true scores and the measurement errors. The smaller the value of Xe, the better. It seems that the measurement errors are random (at random), this means that they are independent of the true scores Xt. In other words, a measurement error affects both someone with a high true score and someone with a low true score in the same way and with the same amount. There are two characteristics:

  • The average of all measurement errors within a test is zero.
  • Measurement errors do not correlate with true scores, rte = 0.

Instead of saying that reliability depends on the consistency between differences in observed scores and differences in true scores, you can also say: reliability depends on the relationships between the variability of the observed score, variability of the true score, and variability of the measurement error score.

  • Error score variance: Se² = ∑ (Xe minus average Xe) ² / N. The higher Se², the worse the measurement.
  • True score variance: St² = ∑ (Xt minus average Xt) ² / N
  • Observed score variance: So² = ∑ (Xo minus average Xo) ² / N. Or, So² = St² + Se².

This formula should actually be: So² = St² + Se² + 2rte * St * Se.

However, the true scores and the measurement errors are not correlated and therefore rte * st * se = 0. So there remains: So² = St² + Se².

What are the four types of reliability?

1. Reliability in terms of "proportions of variances"

Rxx (reliability coefficient) = St² / So²

Rxx = 0 means that everyone has the same true score. (St² = 0)

Rxx = 1 means that the variance of the true scores is equal to the variance of the observed scores. In other words: there are no measurement errors!

Here is an example of interpretation of Rxx:

Rxx = 0.48 or 48% of the differences in the observed scores can be attributed to the true scores. On the other hand, 1-0.48 = 0.52, so 52% of the differences can be attributed to measurement errors.

2. Reliability in terms of "lack of measurement error"

Rxx (reliability coefficient) = St² / So²

So² = St² + Se² (and therefore also: St² = So² - Se²)

Rxx = (So² - Se²) / So² = (So² / So²) - (Se² / So²)

In other words: Rxx = 1 - (Se² / So²): when (Se² / So²) is small, the reliability is high.

3. Reliability in terms of "correlations"

Rxx = Rot², where Rot² is the squared correlation between the observed scores and the true scores.

Rot = St² / (So * St) = Rot = St / So

Rot² = St² / So².

A reliability of 1.0 indicates that the differences between the observed test scores perfectly match the differences between the true scores. A reliability of 0.0 indicates that the differences between the observed scores and the true scores are totally contradictory.

4. Reliability in terms of "lack of correlation"

Rxx = 1 - Roe², where Roe² is the squared correlation between the observed scores and the error scores.

Roe = Se² / (So * Se) = Se / So

Roe² = Se² / So² so:

Rxx = 1 - Roe² = 1 - (Se² / So²).

If Roe = 0, then Rxx = 1.0

The greater the correlation between the observed scores and the error scores, the smaller Rxx. So reliability will be relatively high if the observed scores have a low correlation with the error scores.

How is the size of the measurement error expressed?

Although reliability is an important psychometric construct, it does not give a direct reflection of the magnitude of the measurement error of a test. Additional coefficients are therefore needed at this point. The standard measurement error displays the average size of the error scores. The greater the standard measurement error, the greater the average difference between observed scores and true scores, and therefore the lesser the reliability of the test.

Standard measurement error = sem

sem = So * √ (1 - Rxx)

If Rxx = 1 then Sem = 0, so: Rxx greater means sem smaller.
sem is never greater than So, so: greater means sem means greater
How is the theory of reliability translated into practice?

The theory of reliability is based on three terms: true scores, observed scores, and error scores. But in practice we do not know whether a score is actually the true score of an individual. We also do not know to what extent measurement errors influence the response of an individual. How then do we translate the theory of reliability into practice?

Although we cannot determine with certainty what the reliability or standard measurement error of a test is, advanced methods have been developed to estimate it. Examples of such techniques are giving two versions of the test, doing the same test twice and so on. In this section, four methods are discussed to estimate the reliability and standard measurement error of a test:

  1. Parallel testing;
  2. the tau equivalent test model;
  3. essentially tau equivalent test model;
  4. congeneric test model.

Each model offers a perspective on how two or more tests are the same.

1. Parallel tests

We speak of parallel tests when two (or more) tests, in addition to the basic assumptions of classical test theory, meet the following three assumptions:

  1. The two tests have the same error variance (se12 = se22).
  2. The intercept between the true scores on both tests is 0 (so a = 0, in Xt2 = a + b (Xt1)).
  3. The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).

These assumptions have six implications:

  1. This implies that the true scores of test 1 are identical to the true scores of test 2 (Xt1 = Xt2).
  2. Derived from this it means that in case the true score of each participant on test 1 is equal to the true score on test 2, the two sets of true scores correlate perfectly with each other (rt1t2 = 1).
  3. The variances of the value scores of tests 1 and 2 are identical (st12 = st22).
  4. The average of the true scores of test 1 is equal to the average of the true scores of test 2.
  5. The variance of the observed scores of test 1 is equal to the variance of the observed scores of test 2. And finally, sixth, the reliability of the tests is the same (R11 = R22).
  6. When the scores of two tests meet all these assumptions and implications, we speak of parallel tests.

Finally, according to the KTT, there is one further implication that follows from the above: the correlation between parallel tests equals reliability. In formula form: r0102 = R11 = R22. In other words, when two tests are actually (perfectly) parallel, the correlation between the two tests is therefore equal to the reliability of both tests.

The correlation between parallel tests can also be calculated based on the variances of the true and observed scores: r0102 = st2 / so2.

2. The tau-equivalent test model

In addition to the standard assumptions of classical test theory, the tau-equivalent test model is based on the following two assumptions:

  1. The intercept between the true scores of both tests is 0 (so a = 0, in Xt2 = a + b (Xt1)).
  2. The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).

These two assumptions are the same as the assumptions of parallel tests. The difference lies in the first additional assumption: the tau-equivalent test model does not state the assumption of appropriate error variances. This leads to four implications (the first four that we discussed in parallel testing).

The less strict assumptions mean that the correlation between tau-equivalent tests is not a valid estimate of the reliability. This is in contrast to parallel tests, where the correlation between the tests is therefore a valid estimate of the reliability.

3. The essentially tau-equivalent test model

In addition to the standard assumptions of classical test theory, the essentially tau-equivalent test model is based on one additional assumption:

The slope between the true scores on both tests in 1 (so b = 1, in Xt2 = a + b (Xt1)).
This leads to two implications (the first two discussed in parallel tests), or: rt1t2 = 1 (the correlation between the tests is perfect), and st12 = st22 (the variances of the true scores of both tests are equal).

4. Congeneric test model

The last model is the congeneric test model. According to this model, only the assumptions of classical test theory are accepted. This results in a single implication, namely that the correlation of the true scores between the tests is equal: rt1t2 = 1. This model is therefore the most strict and the most general model. Although this model is more often applicable (this model is conditional for more districted models), it offers limited possibilities for estimating reliability.

What is the 'Domain Sampling Theory'?

According to this theory, reliability is the average size of the correlations between all possible pairs of tests with N items selected from an area ("domain") of test items. The logic of this theory is the foundation of the generalizability theory, this will be discussed extensively in chapter thirteen.

Chapter 5 is about the reliability of a test. Reliability is the extent to which differences in the observed scores of the respondent concerned correspond with differences in his or her true scores. The smaller the difference, the more reliable.

Image  Image  Image  Image

Access: 
Public
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why would you use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the menu above every page to go to one of the main starting pages
    • Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the topics and taxonomy terms
    • The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  3. Check or follow your (study) organizations:
    • by checking or using your study organizations you are likely to discover all relevant study materials.
    • this option is only available trough partner organizations
  4. Check or follow authors or other WorldSupporters
    • by following individual users, authors  you are likely to discover more relevant study materials.
  5. Use the Search tools
    • 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
    • The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Field of study

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:
Statistics
1146