Summary | What is the importance of reliability? - Chapter 7

Join Log in Profile Search

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

What is the importance of reliability? - Chapter 7

This chapter explains how reliability and measurement errors affect the results of behavioral research. Awareness of these effects is crucial for behavioral research.

Which two sources of information can help evaluate an individual test score?
On which two factors does the correlation of observed scores from two measurements depend?
In addition to reliability, what else should you pay attention to when looking at the results of a study?
What should you pay attention to in the construction and improvements of tests?

Which two sources of information can help evaluate an individual test score?

There are two important sources of information that can help us evaluate an individual test score. The first is a point estimator. This is a value that is interpreted as the best estimate of someone's score on a psychological trait. The second is a confidence interval, which gives an area with values in which the true score of a person lies. If the true score has a large confidence interval, we know that the observed score is a poor point estimator of the true score.

Point estimates

Two types of point estimates can be taken from an individual observed score. The first point estimator is based on the observed test score only. When a test subject takes the test at a certain moment, an observed score is obtained. This is then an estimate of the true score. The second point estimator also takes the measurement error into account. By estimating with the score of the first test what the test subject will score in the second test, an adjusted true score can be estimated based on this estimate. When a test subject takes the test for a second time, the second time the score will be closer to the group average. This is called regression to the mean. This prediction is based on the logic of the classical test theory and the random measurement error. An estimate of the adjusted true score shows the difference between someone's observed score on the first test and the observed score on the second test. The magnitude and direction of the difference depends on three factors:

The reliability of the test scores.
The magnitude of the difference between the original observed test score and the average of the test scores.
The direction of the difference between the original score and the average of the test scores. The following formula is used to estimate the adjusted true score.

X_est = X_avg + Rxx(X_o – X_avg)

X_est is the estimate of the adjusted true score, X_avg is the average of the test score, R_xx is the reliability of the test and Xo is the observed score. The reliability of the test influences the difference between the estimated true score and the observed score. With a smaller reliability, the difference between the estimated true score and the observed score becomes larger. The observed score itself also influences the difference between the estimated true score and the observed score. The difference will be greater with more extreme observed scores.

One reason for not calculating the estimated true score is that an observed score is already a good estimator of the psychological characteristic and there can be little reason to correct it. A second reason is that the estimated value does not always lead to a regression to the average.

Confidence intervals

Confidence intervals represent the accuracy of the point estimator of an individual true score. The accuracy of the confidence interval and the reliability have a link due to the standard measurement error (se_m).

se_m = s₀ √ (1 - Rxx)

The greater the standard measurement error, the greater the average difference between observed scores and true scores. For the calculation of a 95% confidence interval around the standard measurement error, the following formula applies:

95% confidence interval = Xest ± (1.96) (sem)

Xest is the adjusted true score (that is a point estimate of the true score of an individual), sem is the standard measurement error of the test scores and 1.96 (the z-score) indicates that we calculate a 95% confidence interval. The interpretation of a confidence interval is that we can say with 95% certainty that the true score can be found somewhere in the confidence interval. Tests with a high reliability will require a smaller confidence interval than tests with a lower reliability. Reliability affects the confidence, accuracy and precision with which a person's true score is estimated.

Confidence intervals can be calculated in different ways and with different sizes (95%, 90%, etc.)

The intervals can be calculated with the standard measurement error or the standard estimation error (which is also influenced by reliability). The estimates of the true scores, and the confidence intervals that go with them, are important in making decisions. And reliability plays a major role in this.

On which two factors does the correlation of observed scores from two measurements depend?

According to the classical test theory, the correlation of the observed scores of two measurements (r_xoyo) depends on two factors: the correlation between the true scores of the two psychological constructs (r_xtr_yt) and the reliability of the two measurements (R_xx and R_yy).

r_xoyo = r_xtyt * √ (R_xx * R_yy)

The correlation between two sets of observed scores is:

r_xoyo = c_xtyt / s_xosyo

We can calculate the observed standard deviation with the reliability and the standard deviation of the true scores. See below:

s_xo = s_xt / √R_xx and s_yo = s_yt / √R_yy

The classical test theory shows that the correlation between two measurements is determined by the correlation between psychological constructs and the reliability of the measurements.

The measurement error suppresses the correlation between measurements

There is a difference between the correlation of the observed scores and the correlation of the true scores. This has four important consequences:

The intervals can be calculated using the standard measurement error or the standard estimation error (which is also influenced by reliability). The estimates of the true scores, and the confidence intervals that go with them, are important in making decisions. And reliability plays a major role in this. On which two factors does the correlation of observed scores from two measurements depend? According to the classical test theory, the correlation of the observed scores of two measurements (rxoyo) depends on two factors: the correlation between the true scores of the two psychological constructs (rxtryt) and the reliability of the two measurements (Rxx and Ryy) . rxoyo = rxtyt * √ (Rxx * Ryy) The correlation between two sets of observed scores is: rxoyo = cxtyt / sxosyo We can calculate the observed standard deviation with the reliability and the standard deviation of the true scores. See below: sxo = sxt / √Rxx and syo = syt / √Ryy The classical test theory shows that the correlation between two measurements is determined by the correlation between psychological constructs and the reliability of the measurements. The measurement error suppresses the correlation between measurements There is a difference between the correlation of the observed scores and the correlation of the true scores. This has four important consequences:

The observed correlations (between measurements) will always be weaker than the correlations of the true scores (between psychological constructs). This is because measurements will never be perfect and imperfect measurements make the observed correlations weaker.
The degree of weakening depends on the reliability of the measurements. Even if only one of the tests has low reliability, the correlation of the observed scores becomes a lot weaker compared to the correlation of the true scores.
Error limits the maximum correlation that can be found. As a result, the observed correlation of two measurements can be lower than expected.
It is possible to estimate the true correlation between two constructs. Researchers can estimate all parts of the formula except for the correlation of the true scores. When converting the formula, it results in the following :
r_xtyt = r_xoyo / √R_xx * R_y
This formula is called the correction for attenuation, because researchers can see what the correlation would be if it were not influenced by weakening. The estimated correlation has a perfect reliability and with a perfect reliability the observed correlation is equal to the true correlation.

In addition to reliability, what else should you pay attention to when looking at the results of a study?

Because the measurement error reduces the observed correlation, this has disadvantages for interpreting and leading the research. Results must always be interpreted with the help of reliability. An important result of a study is the effect size. Some effect sizes show the extent to which the variables are interrelated and others show the magnitude of the differences between groups.

An example of an effect size that shows the extent to which two variables are interrelated is the correlation coefficient. High reliability results in larger observed effect sizes and lower reliability reduces the observed effect sizes. There are three common effect sizes that are used in studies: correlations, Cohen's d and N2. These effect sizes are each used in different analytical situations. 1. Correlation is usually used to represent the relationship between two continuous variables. 2. Cohen's d is usually used when looking at the relationship between a dichotomous variable and a continuous variable. 3. N² is usually used when looking at the relationship between a categorical variable with more than two levels, and a continuous variable.

A second important result of a study is statistical significance. Statistical significance provides certainty of a result. If a result is statistically significant then it is seen as a real find and not just a fluke. With statistical significance, a clear difference is demonstrated. The observed effect has a major influence on statistical significance. If the effect size becomes larger, the test is rather statistically significant.

The effect of reliability on effect size and statistical significance is very important when looking at the results of a study.

Including reliability when drawing psychological conclusions from an investigation has three important implications. The first is that researchers should always include the effects of reliability on the results obtained when they interpret effect sizes and statistical significance. The second is that researchers must use measurements that have high reliability. That way the problem of weakening can be kept to a minimum. Yet there are two reasons why a researcher sometimes uses measurements with low reliability. The first reason is that the interest may lie in an area where it is very difficult to obtain high reliability. A second reason may be that researchers work with a low reliability, because a measurement method with a higher reliability has not been searched for long enough. It can take a lot of time, money and effort to find a good method with high reliability. Researchers make the assessment of the effort they want to put in and the reliability that they want to achieve. The third implication of including reliability is that researchers should report reliability estimates of their measurements. This is necessary because the readers must be able to interpret the results.

What should you pay attention to in the construction and improvements of tests?

With test construction and improvement, attention is drawn to the consistency of the test parts and then the items are mainly looked at. Test developers test the items from a test and see which items can be removed or which have to be strengthened to improve the quality of psychometric tests.

To see if an item contributes to internal consistency, the item average is looked at, the item variance and the item discrimination. It is important to know that the procedures and concepts described below must be performed for each dimension measured by the test. So in a one-dimensional test, the following analyzes would be performed on all test items together as one group. And with a multidimensional test, the following analyzes would be performed separately for each dimension.

Item discrimination and other information concerning internal consistency

An important factor for the reliability of internal consistency is the extent to which the test items are consistent with each other. The internal consistency has an intrinsic link with the correlations between the items. With a low correlation, one item has little consistency with the other items and the internal consistency decreases.

To calculate the correlation between items, SPSS can be used to look at the "inter-item correlation matrix", but because many tests consist of many items, this is not the most convenient method. Item discrimination is the extent to which an item distinguishes between people who score high on a test and people who score low on a test. High discrimination values are required for good reliability. There are several ways to calculate an item discrimination. One is the item-total correlation. We can calculate a total score and then calculate the correlation between an item and the total score. This item-total correlation shows how large the difference in responses is at the item compared to how large the difference in responses is in total. A high item-total correlation indicates that the item is consistent with the test as a whole.

With SPSS it is indicated as corrected item-total correlations and then the correlation with the total score is calculated for each item. It is "corrected" because the item itself does not count towards the total score. Another way of item discrimination for binary items is the item discrimination index (D). This compares the proportion (p) of people who scored high on the test and answered the item correctly, with the proportion (p) of people who scored low on the test and answered the item correctly. The proportion of well-answered questions in the group is calculated for those two groups. The difference between the two groups can then be calculated by subtracting the proportion of the lowest group from the proportion of the highest group.

D = p_high - p_low

Items with high D scores are better for internal consistency. SPSS has two other ways to look at the internal consistency of a test, namely the squared multiple correlation and the Cronbach's Alpha if item deleted. The latter gives the correlation of the total test if one item is removed from the list.

Item variance and Item difficulty (mean)

The item's mean and variance are important factors that can influence the quality of a psychometric test. They can contribute to how consistent an item is with the rest of the items. This is important for the reliability of the test. A variable needs variability to be able to correlate with another variable.

If all test subjects answer the same, there is no variability. Variability is required with good reliability.

A link between the item variability and the psychometric quality can be made by the item average. The item average can say something about the item variability. An item with limited variability makes little contribution to psychometric quality. The averages can also be seen as a difficulty. If more people have a good answer to one question than another, the difficulty is different. For example, if the average is 0.70, it means that 70% of people have answered the item correctly. The classical test theory suggests that binary test items must have an average of 0.50 so that all items have maximum variability.

This chapter explains how reliability and measurement errors affect the results of behavioral research. Awareness of these effects is crucial for behavioral research.

Access:

Public

Check more of this topic?

Statistics and Data analysis Methods

Search other summaries?

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

Click & Go to more related summaries or chapters:

Summary of Psychometrics: An Introduction by Furr - 3rd edition

Summaries per chapter with the 3rd edition of Psychometrics: An Introduction by Furr

Please note: for more summaries and study assistance with more and more recent editions of the book, you can check:

Study Guide for summaries with Psychometrics: An introduction by Furr

Summaries and supporting content:

What is psychometrics? - Chapter 1

What is important when assigning numbers to psychological constructs? - Chapter 2

What are variability and covariability? - Chapter 3

What is dimensionality and what is factor analysis? - Chapter 4

What is reliability? - Chapter 5

How to empirically estimate the reliability? - Chapter 6

What is the importance of reliability? - Chapter 7

What is validity? - Chapter 8

How to evaluate evidence for convergent and divergent validity? - Chapter 9

What types of response bias are there? - Chapter 10

What types of test bias are there? - Chapter 11

What is a confirmatory factor analysis? - Chapter 12

What is the generalizability theory? - Chapter 13

What is the Item Response Theory (IRT) and which models are there? - Chapter 14

Access:

Public

Follow the author: Psychology Supporter

Psychology Supporter

More contributions of WorldSupporter author: Psychology Supporter:

Summary of Statistical Methods for the Social Sciences by Agresti - 5th edition - Exclusive

Samenvatting van Statistical Methods for the Social Sciences van Agresti - 5e druk- Exclusive

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Comments, Compliments & Kudos:

Add new contribution

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
How and why would you use WorldSupporter.org for your summaries and study assistance?
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
Quicklinks to fields of study for summaries and study assistance

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

How and why would you use WorldSupporter.org for your summaries and study assistance?

For free use of many of the summaries and study aids provided or collected by your fellow students.
For free use of many of the lecture and study group notes, exam questions and practice questions.
For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
For compiling your own materials and contributions with relevant study help
For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Use the menu above every page to go to one of the main starting pages
- Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
Use the topics and taxonomy terms
- The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
Check or follow your (study) organizations:
- by checking or using your study organizations you are likely to discover all relevant study materials.
- this option is only available trough partner organizations
Check or follow authors or other WorldSupporters
- by following individual users, authors you are likely to discover more relevant study materials.
Use the Search tools
- 'Quick & Easy'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
- The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study for summaries and study assistance

Field of study

Statistics and Data analysis Methods

Statistics

1267