Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 3
Typical performance tests

Typical performance tests assess behavior that is typical for the person.
These tests are used to measure attitudes, interests, values, opinions, and personality characteristics.

Construct of interest

The test developer has to specify the latent variable of interest that is assumed to effect test takers’ item responses and test scores.
The ususal constructs of interest of typical performance tests are:

  • Attitudes
  • Interests
  • Values
  • Opinions
  • Personality characteristics

The responses to typical performance tests are not evaluated on their correctness, but are considered to typify a person.

At the start of a test development project, the researcher needs information on the construct of interest. This information can be obtained from different sources

A study of the literature on the construct and existing measurement instruments is nearly always needed at the start of a test development project
Different types of research can be done on the construct.

  • Focus group method
    Uses small groups of persons who have experiential knowledge about the construct.
    A focus group meets with the test developer to talk about their experiences with the construct.
  • Key information method
    Uses persons who have expert knowledge about the construct of interest. The test developer interviews these key informants about the constructs.
  • Observation method

The test developer can use information from different sources to define the construct and, later on the test development process, he or she can use this information for item writing.

Measurement mode

  • Self-report mode
    The test taker answers questions on a typical performance construct
  • Other-report mode
    A person answers questions about another person’s construct
  • Somatic indicator mode
    Uses somatic signs to measure constructs
  • Physical trace mode
    Uses traces that are left behind to measure constructs

Each of these four modes can occur in tow different varieties

  • Reactive measurement mode
    When test takers can deliberately distort their construct value
  • Nonreactive measurement mode
    When test takers cannot distort their construct value

The reactive/nonreactive distinction is only used for typical performance measurements, and not for maximum performance measurements.
A maximum performance test asks test takers to do the best they can to perform the task.

Each of the four response modes can occur in two versions

Self-report mode
Test takers are asked to respond to questions or stimuli to assess their attitudes, values, interests, opinions, or personality.

  • Reactive: questionnaires or inventories
    Test takers are fully aware of what questionnaires and inventories are measuring, and they can easily distort their construct values
  • Nonreactive: test takers respond to stimuli, but it is not clear to them what construct is measured

Other-report mode
Uses other people to report on a given person’s typical performance construct.

  • Reactive: the person is aware that another person reports his or her construct and he or she can try to affect the other’s report
  • Nonreactive: persons cannot adapt their behavior to affect the other’s report

Somatic indicators mode
Uses somatic signs to assess typical performance constructs.

  • Reactive: the test taker can suspect what is being measured, and may deliberately affect the measurement
  • Nonreactive: the person is unaware of being measured by his or her somatic signs

Physical traces mode
Uses traces that persons left behind to assess their typical performance constructs

  • Reactive: the persons are aware that their traces can be noticed by others and may be used to assess their characteristics
  • Nonreactive: the person is unaware that his or her traces may be used for measurement purposes.

The objectives

  • Scientific vs practical
  • Individual level vs group level
  • Used for description vs diagnosis vs decision making

Population

The test developer must define the target population and the inclusion and exclusion criteria of persons.
If subpopulations need to be distinguished, the test developer must define these subpopulations and must provide criteria to include persons in these subpopulations.

The conceptual framework

Tree broad classes of strategies to construct typical performance tests. In reach of these classes, tow specific test development methods are distinguished.

  • Intuitive class
    The relation between the construct and the items is of an intuitive nature.
    • Rational method
      Uses a loose description of the construct that is based on the knowledge of experts or members of the target population. The items are written from this intuitive knowledge of the construct.
    • Prototypical method
      Asks members of the target population to think of persons having the construct, and to write down their behavior that is typical for the construct. This prototypical information is used to write test items.
  • Inductive class
    The tests are derived form empirical data.
    • Internal method
      Starts from a broad pool of personality or attitude items. The items are administered to a sample of the target population of persons, and the associations are computed between item scores. The test developer searches for clusters of items that are highly associated, and each of these clusters specifies a construct.
    • External method
      Starts from a broad pool of items and criterion that has to be predicted. The items and the criterion are measured in a sample of persons from the target population. The associations between the person’s item and criterion scores are computed, and the items that have the highest item-criterion associations are included in the test.
  • Deductive class
    Start from theoretical or conceptual notions of the construct.
    • Construct method: starts from an explicit theory, and items are derived from this theory.
    • Facet method: does not use an explicit theory. It starts from a conceptual analysis of the construct.

Construct method

Uses a theoretical framework.
The construct is defined, and it is embedded in a network of other constructs. The theory and its network are used to write items.

The facet design method

Generates items from a conceptual analysis of the construct that has to be measured by the test.
Starts with an inventory of the observable behavior that applies to the construct.
This behavior is classified according to a number of aspects, which are called facets. Each of these facets contains a number of facet elements.
Important facets for the construction of typical performance tests are behavioral and situational facets.

  • Behavioral facets classify types of behavior
  • Situational facets classify the situations where the behaviors appear

The facets are crossed, and items are written for each of the combinations of the different facet elements.

Item response mode

A typical performance item consists of a question or statement, and the test taker is asked to answer the question or to react to the statement.
A number of distinctions are made, and these are used to classify the response modes of typical performance items.

  • Open-ended vs closed-ended
    Open-ended: asks the test taker to complete a question or statement
  • Closed-ended: consists of a statement or question and a response scale. The test taker is asked to indicate his or her position on the response scale.

The response scales of closed-ended items are divided into:

  • Frequency response scale:
    Asks the test taker to indicate the frequency of occurrence.
  • Endorsement response scale
    Asks the test taker to indicate his or her degree of endorsement of the statement

Endorsement scales are subdivided into:

  • All-or-none
    Asks the test taker to indicate whether he or she endorses the statement or not
  • Intensity endorsement scale
    Asks the test taker to indicate the degree of his or her endorsement of the statement

The intensity endorsement scales are subdivided in

  • Discrete intensity endorsement scale
    Asks the test taker to indicate his or her degree of endorsement by choosing one out of more than two ordered categories
  • Continuous intensity endorsement scale
    Asks the test taker to indicate his or her degree of endorsement at a bounded-continuous scale, such as a line segment (visual analogue scale)

Unipolar scale: a response scale can go from a zero point to one direction only.
Bipolar scale: a response scale can go from a negative pole to a positive pole
Dichotomous scale: a scale with only two categories
Ordinalpolytomous: a scale that has more than two ordered categories
Bounded-continuous scale: a continuous scale that is bounded, for example, with two endpoints

Administration mode

The main modes to administer typical performance tests are:

  • Oral
    • Face-to-face administration
    • Telephone administration
  • Paper-and-pencil
    • Personal pencil-and-paper administration
    • Mail pencil-and-paper administration
  • Computerized
  • Computerized adaptive

Item writing guidelines

  • Elicit different answers at different construct positions
  • Focus on one aspect per item
  • Avoid making assumptions about test takers
  • Use correct language
  • Use clear and comprehensible wording
  • Use non-sensitive language and content
  • Put the situational or conditional part of a statement at the beginning and the behavioral part at the end
  • Use positive statements
  • Use 5-7 categories in ordinal-polytomous response scales
  • Label each of the categories of a response scale and avoid the use of numbers alone
  • Format response categories vertically

Item rating guidelines

  • Rate answers anonymously
  • Rate the answers to one item at a time
  • Provide the rater with a frame or reference
  • Separate irrelevant from relevant aspects
  • Use more than one rater
  • Rerate the answers
  • Rate all answers to an item on the same occasion
  • Rearrange the order of answers
  • Read a sample of answers

Pilot studies on item quality

Pilot studies are necessary to assess the quality of concept items.
Usually, a large number of concept items has to be revised or has to be removed from the pool of concept items.
Three types of pilot studies:

  • Experts
  • Test takers
  • Rater pilot studies

Expert’s pilots

Concept items have to be reviewed by experts.
Three types of expertise are needed:

  • Substantive expertise on the content of concept items
  • Technical aspects of the items
  • Sensitivity experts

Test takers’ pilots

The concept items are administered to a small group test takers from the target population. Each of the test takers is interviewed about their thinking while answering the items.

  • Concurrent interview: asks the test taker to think aloud while answering the items
  • Retrospecitive interview: asks the test taker to recollect their thinking after completing the items.
    The interviews are recorded and this information is used to revise or remove concept items.

Response tendencies

Responses to typical performance items may be affected by response tendencies.
Response tendency: the differential application of the response scale.

Response style: the differential use of the item response scale by different persons.
A response style varies between responses, but it is relatively constant across measurements of different constructs and across measurements of the same construct on different constructs and across measurements of the same construct on different occasions.
It is a person-specific property.
Important response styles are:

  • Acquiescence
    The tendency to agree with an endorsement statement, independently of the content of the statement
  • Dissentience
    The tendency to disagree with an endorsement statement, independently of the content of the statement
  • Extremity
    The tendency to choose extremes of the item response scale
  • Midpoind
    The tendency to choose the middle of the response scale

Response set: the differential use of the item response scale by different persons and different constructs.
The response may differ between persons and between constructs, and is only relatively stable across measurements of the same construct on different occasions. It is a person/construct-specific property.
Response sets:

  • Social desirability: a persons tendency to deceive either oneself or others.
  • Self-deception: the tendency to deceive oneself.
  • Impression management: the tendency to deceive others by making good or bad impressions on others.

Social desirability is a person-specific property because it varies between persons.
It is also construct-specific because it may vary between constructs.
The best strategy is to assess social desirability with specific measurement instruments.

  • Self-deception is related to constructs that are mainly relevant for persons themselves
  • Impression management is related to constructs that are mainly relevant for the person’s social relations.

Acquiescence and dissentience can only occur with endorsement items and not inn frequency items.
The extremity and midpoint response styles can occur in both.
Acquiescence, dissentience, the extemity and midpoint styles can occur in both the reactive self-report and the reactive other-report measurement modes.
Social desirability can only occur in the reactive self-report mode.

Acquiesce and dissentience can be detected by including both indicative and contra-indicative items into the questionnaire.

The extremity and midpoint response styles are hard to detect.

Compiling the first draft of the test

The concept items that survived the pilot studies are used to compile the first draft of the test and instructions for test takers are added.
Usually, the instruction contains some example items to guarantee that test takers understand the test items.

Balanced test: consists of about 50% indicative and 50% contra-indicative items.
Social desirability items can also be added to the test.
Usually, indicative, contra-indicative, and social desirability items are arbitrarily mixed in the test.

The concept test is submitted to a group of experts. This group can be the same as the group that was used in the experts’ pilot study on item quality.
The group needs to have expertise about both the construct and test construction. The experts evaluate whether the test instruction is sufficiently clear for the population of test takers.
They study the content validation (whether the test adequately coves all aspects of the construct)

The comments of the experts are used to compile the first draft of the test.
The first draft is administered in a try-out to at least 200 test takers from the target population.
The try-out data are analyzed using methods of classical and modern test theory. s

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activities abroad, study fields and working areas:
Countries and regions:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: SanneA
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
4490