Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 2
Developing maximum performance tests

Seven elements

  • Construct
  • Measurement mode
  • Objectives
  • Population and subpopulations
  • Conceptual framework
  • Respons mode
  • Administration mode

Construct of interest

The test developer must specify the latent variable of interest that has to be measured by the test.
Latent variable is a general term. The term construct is used when a subsantive interpretation is given of the latent variable.
The latent variable (construct) is assumed to effect test makers’ item responses and test scores.

Constructs can vary in many different ways.

  • Vary in content of mental abilities, psychomotor skills or physical abilities
  • Construct may vary in scope
    For example: from general intelligence to multiplication skill
  • Constructs vary from educational to psychological variables.

A good way to start a test development project is to define the construct that has to be measured by the test.
This definition describes the construct of interest, and distinguished it from other, related, constructs.
Usually, the literature on the construct needs to be studies before the definition can be given. Frequently the definition can only be given when other elements of the test development plan are specified.

Measurement mode

Different modes can be used to measure constructs.

  • Self-performance mode
    The test taker is is asked to perform a mental or physical task
  • Self-evaluation mode
    The test taker is asked to evaluate his or her ability to perform the task
  • Other-evaluation mode
    Ask others to evaluate a person’s ability to perform a task

The objectives

The test developer must specify the objectives of the test. Tests are used for many different purposes.

  • Scientific vs practical
  • Individual level vs groep level
  • Description (describe performances) vs diagnosis (adds a conclusion to a description) vs decision-making (decisions are based on tests)

The population

Target population: the set of persons to whom the test has to be applied.
The test developer must define the target population, and must provide criteria for the inclusion and exclusion of persons.
A target population can be split into distinct subpopulations. The test developer must specify whether subpopulations need to be distinguished. And, if so, they need to define the subpopulations, and to provide criteria to include persons in subpopulations.

The conceptual framework

Test development starts with a definition or description of the construct that has to be measured by the test. But, the definition or description is usually not concrete enough to write test items.

A conceptual framework: gives the item writer a handle to write items.
In the literature, examples of conceptual frameworks are available.

Item response mode

The item response mode needs to be specified before item writing starts.
Dsitinction:

  • Free- vs constructed-response
  • Choice vs selected response

Free response items are further divided into:

  • Short-answer items
  • Essay items

Different types of choice modes are used in achievement and ability testing:

  • Conventional multiple-choice mode
    Consists of a stem and two or more options. The options are divided into one correct answer and one or more distractors.
    Usually, choosing the correct option of a multiple-choice item indicates that test takers’ ability or skill is sufficiently high to solve the item.
    Distractors can be constructed to contain specific information on the reasons why the test taker failed to solve the item correctly. The choice of a distractor indicates which deficiency the test taker has and as such can be used for diagnosing specific deficiencies.
  • A dichotomous item response scale has two ordered categories. An answer is correct or incorrect.
  • An ordinal-polytomous scale has more than two ordered categories.
    Partial ordinal-polytomous response scale: the correct option is ordered above the distractors, but the distractors are not ordered among themselves.

Administration mode

A test can be administered to test takers in different ways:

  • Oral
    The test is presented orally by a single test administrator to a single test maker
  • Paper-and-pencil
    The test is presented in the form of a booklet
  • Computerized
    Test order of the items is the same for each of the test takers. It is presented on a computer.
  • Computerized adaptive test administration
    The test is adaptive. The computer program searches for the items that best fit the test taker.

Item-writing guidelines

  • Focus on one relevant aspect
    Each item should focus on a single relevant aspect of the specification in order to guarantee good coverage of the important aspects of the achievement or ability.
    Only a single aspect of the specification needs to be measured to guarantee that test takers’ item responses are unambiguously interpretable.
  • Use independent item content
    The content of different items is independent.
    Testlet: a group of items that may be developed as a single unit that is meant to be administered together.
  • Avoid overly specific and overly general content
    The disadvantage of overly specific item content is that the content may be trivial, and the disadvantage of overly general content is that the content may be ambiguous
  • Avoid items that deliberately deceive test takers
  • Keep vocabulary simple for the population of test takers
  • Put item options vertically
  • Minimize reading time and avoid unnecessary information
  • Use correct language
  • Use non-sensitive language
  • Use a clear stem and include the central idea in the stem
  • Word the item positively, and avoid negatives
    Negative phrased items are harder to understand and may confuse test readers.
  • Use three options, unless it is easy to write plausible distractors
  • Use one option that is unambiguously the correct or best answer
  • Place the options in alphabetical, logical, or numerical order
  • Vary the location of the correct option across the test
  • Keep the options homogeneous in length, content, grammar, etc.
  • Avoid ‘all-of-the-above’ as the last option
  • Make distractors plausible
  • Avoid giving clues to the correct option

Item rating guidelines

The responses to free- (constructed-) response items have to be grated by raters.

Important guidelines:

  • Rate responses anonymously
  • Rate responses to one item at a time
  • Provide the rater with a frame of reference
  • Separate irrelevant aspects from the relevant performance
  • Use more than one rater
  • Re-rate the free responses
  • Rate all responses to an item on the same occasion
  • Rearrange the order of responses
  • Read a sample of responses

Pilot studies on item quality

Standard practice is that item writers produce a set of concept items and pilot studies are done to test the quality of these concept items.
Generally, at least half of the concept items do not survive the pilot studies, and items that survive are usually revised several times.

Expert’s and test takers’ pilot studies need to be done for both free-response and multiple-choice items.
For free-response items pilot studies need to be done on the ratings of test takers’ responses to the items.

Expert’s pilots

The concept items have to be reviewed before they are included in a test.
Items are reviewed on their content, technical aspects, and sensitivity.

  • The content and technical aspects are assessed by experts in both the field of the test and item writing.

Each of the concept items is discussed by a panel of experts.
A good start for the discussion of a multiple-choice item is to look for distractors that panel members could defend as (partly) correct answers.
The reviewing of the items yields qualitative information that is used to rewrite items or to remove concept items that cannot be repaired.
Revised items should be reviewed again by experts until further rewriting is not needed.

The sensitivity of items also needs to be reviewed.
Usually, the panel for the sensitivity review of the items consists of person not on the panel reviewing the content and technical aspects of the items.
The sensitivity review panel is composed of members of different groups.
The panel has to be trained to detect aspects of the items and the tests that may be sensitive to subpopulations.
The sensitivity review provides qualitative information that also could lead to rewriting or removal of concept items.

Test takers’ pilots

The concept items are individually administered to a small group test takers from the population of interest.
Each of the test takers is interviewed on their thinking while working on an item.

Two versions of the interview can be applied

  • Concurrent interview: the test taker is asked to think aloud while working on the item
  • Retrospective interview: the test taker is asked to recollect his or her thinking after completing the item.

Protocols of the interviews are made and the information is used to rewrite or remove concept items.

Compiling the first draft of the test

The concept items that survived the pilot studies are used to compile a concept version of the test that includes instructions for the test takers.
Usually, the instruction contains some example items that test takers have to answer to ensure that they understand the test instructions.
The concept test may consist of a number of subtest that measure different aspects of the ability or achievement.

The conventional way of assembling a maximum performance test is to start with easy items and to end with difficult items.

The concept test is submitted o a group of experts.
The group can be the same as the group that was used in the experts’ pilot study on item quality. The group has expertise in:

  • The content of the ability or achievement
  • Test construction

The experts evaluate two different properties of the concept test.

  • Whether the test instruction is sufficiently clear for the population of test takers
  • Whether the test yields adequate coverage of all aspects of the ability or achievement being measured by the test. (content validation)
  • Whether the test is balanced with respect to multicultural material and references to gender

The comments of the experts are used to compile the first draft of the test.
The first draft of the test is administered in a try-out to a sample of at least 200 test takers from the population of interest.

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Activities abroad, study fields and working areas:
Countries and regions:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: SanneA
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
2642 1