Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 3
Typical performance tests

Construct of interest
Measurement mode
The objectives
Population
The conceptual framework
Item response mode
Administration mode
Item writing guidelines
Item rating guidelines
Pilot studies on item quality
Response tendencies
Compiling the first draft of the test

Typical performance tests assess behavior that is typical for the person.
These tests are used to measure attitudes, interests, values, opinions, and personality characteristics.

Construct of interest

The test developer has to specify the latent variable of interest that is assumed to effect test takers’ item responses and test scores.
The ususal constructs of interest of typical performance tests are:

Attitudes
Interests
Values
Opinions
Personality characteristics

The responses to typical performance tests are not evaluated on their correctness, but are considered to typify a person.

At the start of a test development project, the researcher needs information on the construct of interest. This information can be obtained from different sources

A study of the literature on the construct and existing measurement instruments is nearly always needed at the start of a test development project
Different types of research can be done on the construct.

Focus group method
Uses small groups of persons who have experiential knowledge about the construct.
A focus group meets with the test developer to talk about their experiences with the construct.
Key information method
Uses persons who have expert knowledge about the construct of interest. The test developer interviews these key informants about the constructs.
Observation method

The test developer can use information from different sources to define the construct and, later on the test development process, he or she can use this information for item writing.

Measurement mode

Self-report mode
The test taker answers questions on a typical performance construct
Other-report mode
A person answers questions about another person’s construct
Somatic indicator mode
Uses somatic signs to measure constructs
Physical trace mode
Uses traces that are left behind to measure constructs

Each of these four modes can occur in tow different varieties

Reactive measurement mode
When test takers can deliberately distort their construct value
Nonreactive measurement mode
When test takers cannot distort their construct value

The reactive/nonreactive distinction is only used for typical performance measurements, and not for maximum performance measurements.
A maximum performance test asks test takers to do the best they can to perform the task.

Each of the four response modes can occur in two versions

Self-report mode
Test takers are asked to respond to questions or stimuli to assess their attitudes, values, interests, opinions, or personality.

Reactive: questionnaires or inventories
Test takers are fully aware of what questionnaires and inventories are measuring, and they can easily distort their construct values
Nonreactive: test takers respond to stimuli, but it is not clear to them what construct is measured

Other-report mode
Uses other people to report on a given person’s typical performance construct.

Reactive: the person is aware that another person reports his or her construct and he or she can try to affect the other’s report
Nonreactive: persons cannot adapt their behavior to affect the other’s report

Somatic indicators mode
Uses somatic signs to assess typical performance constructs.

Reactive: the test taker can suspect what is being measured, and may deliberately affect the measurement
Nonreactive: the person is unaware of being measured by his or her somatic signs

Physical traces mode
Uses traces that persons left behind to assess their typical performance constructs

Reactive: the persons are aware that their traces can be noticed by others and may be used to assess their characteristics
Nonreactive: the person is unaware that his or her traces may be used for measurement purposes.

The objectives

Scientific vs practical
Individual level vs group level
Used for description vs diagnosis vs decision making

Population

The test developer must define the target population and the inclusion and exclusion criteria of persons.
If subpopulations need to be distinguished, the test developer must define these subpopulations and must provide criteria to include persons in these subpopulations.

The conceptual framework

Tree broad classes of strategies to construct typical performance tests. In reach of these classes, tow specific test development methods are distinguished.

Intuitive class
The relation between the construct and the items is of an intuitive nature.
- Rational method
  Uses a loose description of the construct that is based on the knowledge of experts or members of the target population. The items are written from this intuitive knowledge of the construct.
- Prototypical method
  Asks members of the target population to think of persons having the construct, and to write down their behavior that is typical for the construct. This prototypical information is used to write test items.
Inductive class
The tests are derived form empirical data.
- Internal method
  Starts from a broad pool of personality or attitude items. The items are administered to a sample of the target population of persons, and the associations are computed between item scores. The test developer searches for clusters of items that are highly associated, and each of these clusters specifies a construct.
- External method
  Starts from a broad pool of items and criterion that has to be predicted. The items and the criterion are measured in a sample of persons from the target population. The associations between the person’s item and criterion scores are computed, and the items that have the highest item-criterion associations are included in the test.
Deductive class
Start from theoretical or conceptual notions of the construct.
- Construct method: starts from an explicit theory, and items are derived from this theory.
- Facet method: does not use an explicit theory. It starts from a conceptual analysis of the construct.

Construct method

Uses a theoretical framework.
The construct is defined, and it is embedded in a network of other constructs. The theory and its network are used to write items.

The facet design method

Generates items from a conceptual analysis of the construct that has to be measured by the test.
Starts with an inventory of the observable behavior that applies to the construct.
This behavior is classified according to a number of aspects, which are called facets. Each of these facets contains a number of facet elements.
Important facets for the construction of typical performance tests are behavioral and situational facets.

Behavioral facets classify types of behavior
Situational facets classify the situations where the behaviors appear

The facets are crossed, and items are written for each of the combinations of the different facet elements.

Item response mode

A typical performance item consists of a question or statement, and the test taker is asked to answer the question or to react to the statement.
A number of distinctions are made, and these are used to classify the response modes of typical performance items.

Open-ended vs closed-ended
Open-ended: asks the test taker to complete a question or statement
Closed-ended: consists of a statement or question and a response scale. The test taker is asked to indicate his or her position on the response scale.

The response scales of closed-ended items are divided into:

Frequency response scale:
Asks the test taker to indicate the frequency of occurrence.
Endorsement response scale
Asks the test taker to indicate his or her degree of endorsement of the statement

Endorsement scales are subdivided into:

All-or-none
Asks the test taker to indicate whether he or she endorses the statement or not
Intensity endorsement scale
Asks the test taker to indicate the degree of his or her endorsement of the statement

The intensity endorsement scales are subdivided in

Discrete intensity endorsement scale
Asks the test taker to indicate his or her degree of endorsement by choosing one out of more than two ordered categories
Continuous intensity endorsement scale
Asks the test taker to indicate his or her degree of endorsement at a bounded-continuous scale, such as a line segment (visual analogue scale)

Unipolar scale: a response scale can go from a zero point to one direction only.
Bipolar scale: a response scale can go from a negative pole to a positive pole
Dichotomous scale: a scale with only two categories
Ordinalpolytomous: a scale that has more than two ordered categories
Bounded-continuous scale: a continuous scale that is bounded, for example, with two endpoints

Administration mode

The main modes to administer typical performance tests are:

Oral
- Face-to-face administration
- Telephone administration
Paper-and-pencil
- Personal pencil-and-paper administration
- Mail pencil-and-paper administration
Computerized
Computerized adaptive

Item writing guidelines

Elicit different answers at different construct positions
Focus on one aspect per item
Avoid making assumptions about test takers
Use correct language
Use clear and comprehensible wording
Use non-sensitive language and content
Put the situational or conditional part of a statement at the beginning and the behavioral part at the end
Use positive statements
Use 5-7 categories in ordinal-polytomous response scales
Label each of the categories of a response scale and avoid the use of numbers alone
Format response categories vertically

Item rating guidelines

Rate answers anonymously
Rate the answers to one item at a time
Provide the rater with a frame or reference
Separate irrelevant from relevant aspects
Use more than one rater
Rerate the answers
Rate all answers to an item on the same occasion
Rearrange the order of answers
Read a sample of answers

Pilot studies on item quality

Pilot studies are necessary to assess the quality of concept items.
Usually, a large number of concept items has to be revised or has to be removed from the pool of concept items.
Three types of pilot studies:

Experts
Test takers
Rater pilot studies

Expert’s pilots

Concept items have to be reviewed by experts.
Three types of expertise are needed:

Substantive expertise on the content of concept items
Technical aspects of the items
Sensitivity experts

Test takers’ pilots

The concept items are administered to a small group test takers from the target population. Each of the test takers is interviewed about their thinking while answering the items.

Concurrent interview: asks the test taker to think aloud while answering the items
Retrospecitive interview: asks the test taker to recollect their thinking after completing the items.
The interviews are recorded and this information is used to revise or remove concept items.

Response tendencies

Responses to typical performance items may be affected by response tendencies.
Response tendency: the differential application of the response scale.

Response style: the differential use of the item response scale by different persons.
A response style varies between responses, but it is relatively constant across measurements of different constructs and across measurements of the same construct on different constructs and across measurements of the same construct on different occasions.
It is a person-specific property.
Important response styles are:

Acquiescence
The tendency to agree with an endorsement statement, independently of the content of the statement
Dissentience
The tendency to disagree with an endorsement statement, independently of the content of the statement
Extremity
The tendency to choose extremes of the item response scale
Midpoind
The tendency to choose the middle of the response scale

Response set: the differential use of the item response scale by different persons and different constructs.
The response may differ between persons and between constructs, and is only relatively stable across measurements of the same construct on different occasions. It is a person/construct-specific property.
Response sets:

Social desirability: a persons tendency to deceive either oneself or others.
Self-deception: the tendency to deceive oneself.
Impression management: the tendency to deceive others by making good or bad impressions on others.

Social desirability is a person-specific property because it varies between persons.
It is also construct-specific because it may vary between constructs.
The best strategy is to assess social desirability with specific measurement instruments.

Self-deception is related to constructs that are mainly relevant for persons themselves
Impression management is related to constructs that are mainly relevant for the person’s social relations.

Acquiescence and dissentience can only occur with endorsement items and not inn frequency items.
The extremity and midpoint response styles can occur in both.
Acquiescence, dissentience, the extemity and midpoint styles can occur in both the reactive self-report and the reactive other-report measurement modes.
Social desirability can only occur in the reactive self-report mode.

Acquiesce and dissentience can be detected by including both indicative and contra-indicative items into the questionnaire.

The extremity and midpoint response styles are hard to detect.

Compiling the first draft of the test

The concept items that survived the pilot studies are used to compile the first draft of the test and instructions for test takers are added.
Usually, the instruction contains some example items to guarantee that test takers understand the test items.

Balanced test: consists of about 50% indicative and 50% contra-indicative items.
Social desirability items can also be added to the test.
Usually, indicative, contra-indicative, and social desirability items are arbitrarily mixed in the test.

The concept test is submitted to a group of experts. This group can be the same as the group that was used in the experts’ pilot study on item quality.
The group needs to have expertise about both the construct and test construction. The experts evaluate whether the test instruction is sufficiently clear for the population of test takers.
They study the content validation (whether the test adequately coves all aspects of the construct)

The comments of the experts are used to compile the first draft of the test.
The first draft is administered in a try-out to at least 200 test takers from the target population.
The try-out data are analyzed using methods of classical and modern test theory. s

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant and sustainable world. Through physical and online platforms, it supports personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.