What types of test bias are there? - Chapter 11

In the previous chapter we discussed response bias, a common threat to the psychometric quality of tests. In this chapter we discuss the second major threat: test bias. Test bias arises when the true scores and the observed scores differ between two groups. Think of men and women as two groups. The emphasis of test bias is therefore on systematic differences between groups of respondents. Please note that the identification of differences between groups does not necessarily mean that there is also (test) bias; it may be that these differences are actually present in reality that way. 

What types of test bias are there?

There are generally two types of test to distinguish bias: construct predictive bias and bias. Construct bias: bias regarding the meaning of a test. Predictive bias: bias regarding the usability of a test. These two types of test bias are independent of each other. In other words, one bias can exist in a certain test without the other bias.

1. Construct bias

Construct bias concerns the relationship between true and observed scores. This means that the test contains different interpretations of meaning from the two groups. If the interpretation differs per group, a test construct bias is created. This leads to situations in which two groups have the same average true scores but have different average observed scores in a test. 

2. Predictive bias

A predictive bias is a relationship between scores from two different tests. When one test (the so-called predictor test) contains scores that are used as a predictor for the scores of the other test (the so-called outcome test). A predictive bias exists when the relationship between the predictor test (true scores) and the outcome test (observed scores) differs between two groups. In other words, for one group the predictor test is a good predictor but for the other group the predictor test is a bad predictor.

What are the ways to identify test bias ?  

There are roughly two categories of procedures to identify test bias: (1) internal methods that identify construct bias; (2) external methods that identify predictive bias.

Although there is a difference in test scores between two groups, this does not necessarily mean that there is a test bias. Perhaps the difference is based on reality. For example: if a test shows that the weight of men is on average higher than the weight of women, then this is based on reality. But you can have your doubts when it comes to math skills. For example, it is not logical that the math skills of men are better than the math skills of women.

How can you discover construct bias?

Since we never know the true scores of a person, we use procedures that provide an estimate of the existence and extent of a construct bias.

We use internal structures to find out whether there is a construct bias. These contain a pattern of correlations between items and/or correlations between each item and the total score. Evaluation is as follows: we compare the internal structures for a test separately for two groups. If the two groups exhibit the same internal structures in terms of their test responses, we can conclude that the test does not suffer from construct bias. Conversely, if the two groups do differ in internal structures with regard to the test reactions, then there is construct bias.

There are five methods to discover construct bias:

  1. Reliability.
  2. Ranking (rank order).
  3. Item discrimination index.
  4. Factor analysis.
  5. Differential item function analysis.

These five methods are discussed in more detail below.

1. Reliability

An intuitive way to evaluate the construct bias is by testing the reliability for each group separately. One of the ways to estimate reliability is through the internal consistent ( alpha coefficient ). In Chapter 6 we discussed that internal consistent refers to the degree to which parts of a test are interrelated. Translated into this context, this means that the alpha provides insight into the internal structure of a test, that is, are the test items consistent with each other or not. Group differences in reliability are an indication that the test does not "work" equally well for different groups.

2. Ranking (rank order)

If the items can be ordered by difficulty, we can use this ranking to make a relatively quick and easy estimate of the construct bias. This ranking can then be made between different groups, and then compare them with each other. If there is a difference in ranking of the items between the groups, there is an indication of construct bias.

3. Item discrimination index

The item discrimination index is a representation of the extent to which the item is related to the total test score. Item discrimination index distinguishes the variety in levels of the construct that is measured between people.  

An item makes a strong distinction between people with varying levels of the construct being measured, when people with a high capacity have a high chance of correctly answering the relevant question, which concerns the same capacity. However, people with a low capacity have a small chance of correctly answering the question in question, which is about the same capacity (= high item discrimination index value, eg 0.90). This means that the item is a good reflection of the construct that is measured by the test.

An item does not make a good distinction between people with varying levels of the construct being measured, when people with a low capacity give almost as many correct answers as people with a high capacity. An example of a low item discrimination index value is 0.10.

The item discrimination index can be used to estimate construct bias. An item is selected, from which we calculate the item discrimination index separately for each group. We then compare the group indexes per item. Equal indexes = no test bias. Uneven indexes = probably test bias. It is important to know that the item discrimination index is independent of the number of people in a group.

4. Factor analysis

A fourth way to investigate construct bias is through the use of factor analysis for items, where two or more groups are distinguished. As we discussed earlier, factor analysis is a statistical technique to divide the variance or covariance between test items into clusters or factors. A factor is a set of items that correlate highly with each other and therefore are interrelated. If all items correlate equally with each other, then there is one factor and we speak of a unidimensional structure. When there are several factors, we speak of a multidimensional structure.      

There are two trends in factor analysis: exploratory (exploratory factor analysis: EFA) and confirmatory ( confirmatory factor analysis: CFA). The latter, CFA in particular, is suitable for mapping construct bias. CFA is discussed further in the next chapter (chapter 12).

5. Differential item function analysis

Differential item function analysis gives the possibility to estimate the respondent's characteristic levels directly from test data scores. These are also called the true scores. We then compare the characteristic levels (true scores) with the item responses (observed scores) for all people in the two groups, and see if they match. If not, the item will suffer from bias.

  • However, construct bias: two people (male and female) have the same characteristic level, but the item characteristic curve (ICC) is not the same. In other words, the chance that the two people give a correct answer is not the same as each other.
  • No construct bias: two people (male and female) have the same characteristic level and the item characteristic curve is the same. In other words, the chance that the two people give a correct answer is equal.
  • Uniform bias: differences in group in terms of curve location. The two lines do not overlap or cross each other. People from one group with the same characteristic level as people from the other group are less likely to answer the question correctly.
  • Non-uniform bias: difference in group in terms of location and shape. The two lines overlap / intersect. At some levels the item is easier for men and at some levels the item is easier for women.

With uniform and non-uniform bias, the test measures different characteristics for men and women.

What are the ways to identify predictive bias? 

Predictive bias refers to the extent to which test cores are equally predictive of different groups. Ideally, when a test is used to make predictions about people, a test is equally predictive for all groups of people. If this is not the case, and the test is therefore not as predictive for different groups of people, then there is a predictive bias.  

Scores of two variables / measurements are obtained. Next, it is examined to what extent the scores of the first test can be used to predict the scores of the second test (which is related to the scores of the first test). An external evaluation of the test is required to discover the predictive bias. Two considerations are: (1) Does the test really help you predict the outcome? (2) Does the test predict the outcome evenly for several groups? We can investigate this on the basis of regression analysis. 

Regression analysis

Regression analysis contains linear relationships between test scores (true scores) and outcome scores (observed scores).

Ŷ = a + b (X), where X indicates the capacity score

  • a = intercept (starts with X = 0);
  • b = the direction coefficient;
  • Ŷ = predicted value for individual.

The observed scores never exactly end up on the linear regression line. The regression line is formed from predicted scores, and the observed scores do not always match exactly.

''One size fits all'': The regression comparison is applicable to all groups. Different groups share a corresponding regression line, apart from gender, ethnicity, culture, or other group differences.

You can investigate whether the test contains bias by making a regression formula on the basis of data (for example, of both men and women). This is called the common regression line . We must create a regression line for each group separately (i.e, for men and women) and compare it with the common regression line. If these are not the same, then there is a predictive bias. If these are the same, then there is no question of a bias.

Within this method there are different types of bias: intercept bias, slope bias, and intercept + slope bias.

1. Intercept bias

With intercept bias, the direction coefficient of the two group regression analyzes corresponds to the common direction coefficient, but the intercept of the two group regression analyzes does not correspond to the common intercept.

One size does not fit all, so a predictive bias. In other words, there are different observed scores for men and women. Difference consistency exists because the difference between men and women remains the same as the X rises / falls. The two regression lines (of the men and of the women) are parallel to each other.

2. Slope bias

The intercept of the two group regression analyzes is the same as the common intercept . But the direction coefficient of the two group regression analyzes is not the same as the common direction coefficient.

'One size does not fit all ', so a predictive bias. In other words, there are also different observed scores for men and women here. There is no difference consistency, because the difference between men and women changes each time the X rises / falls. The regression lines (of men and women) do not cross each other.

3. Intercept and slope bias

The intercept of the groups is not equal to the common intercept and the direction coefficient of the groups is not equal to the common direction coefficient. This is much more common than that one part contains a bias and the other part does not contain a bias. Here, of course, there is also 'one size does not fit all '. The regression lines (of the men and of the women) do indeed intersect.

What other statistical procedures are there to detect test bias?

In addition to the methods we discussed in this chapter, there are a number of other statistical procedures to discover test bias. Structural equation modeling, for example, can be used to discover test bias. More complex regression models can also be used. But these methods (and their complexity) go beyond the scope of this book and are therefore not discussed further.         

What is the difference between test bias and test fairness?

Test bias is not the same as test fairness. Test fairness has to do with an appropriate use of test scores, in the field of social and / or legal rules and the like. Test fairness is not a psychometric aspect of a test. Test bias, on the other hand, is a psychometric concept, embedded in theory about test score validity. Test bias is defined by specific statistical and research methods, which enable the researcher to make decisions about the test bias. Both are important for psychological testing.     

In the previous chapter we discussed response bias, a common threat to the psychometric quality of tests. In this chapter we discuss the second major threat: test bias. Test bias arises when the true scores and the observed scores differ between two groups. Think of men and women as two groups. The emphasis of test bias is therefore on systematic differences between groups of respondents. Please note that the identification of differences between groups does not necessarily mean that there is also (test) bias; it may be that these differences are actually present in reality that way. 

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Activities abroad, study fields and working areas:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Psychology Supporter
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
1405