Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 2
Developing maximum performance tests

Construct of interest
Measurement mode
The objectives
The population
The conceptual framework
Item response mode
Administration mode
Item-writing guidelines
Item rating guidelines
Pilot studies on item quality
Compiling the first draft of the test

Seven elements

Construct
Measurement mode
Objectives
Population and subpopulations
Conceptual framework
Respons mode
Administration mode

Construct of interest

The test developer must specify the latent variable of interest that has to be measured by the test.
Latent variable is a general term. The term construct is used when a subsantive interpretation is given of the latent variable.
The latent variable (construct) is assumed to effect test makers’ item responses and test scores.

Constructs can vary in many different ways.

Vary in content of mental abilities, psychomotor skills or physical abilities
Construct may vary in scope
For example: from general intelligence to multiplication skill
Constructs vary from educational to psychological variables.

A good way to start a test development project is to define the construct that has to be measured by the test.
This definition describes the construct of interest, and distinguished it from other, related, constructs.
Usually, the literature on the construct needs to be studies before the definition can be given. Frequently the definition can only be given when other elements of the test development plan are specified.

Measurement mode

Different modes can be used to measure constructs.

Self-performance mode
The test taker is is asked to perform a mental or physical task
Self-evaluation mode
The test taker is asked to evaluate his or her ability to perform the task
Other-evaluation mode
Ask others to evaluate a person’s ability to perform a task

The objectives

The test developer must specify the objectives of the test. Tests are used for many different purposes.

Scientific vs practical
Individual level vs groep level
Description (describe performances) vs diagnosis (adds a conclusion to a description) vs decision-making (decisions are based on tests)

The population

Target population: the set of persons to whom the test has to be applied.
The test developer must define the target population, and must provide criteria for the inclusion and exclusion of persons.
A target population can be split into distinct subpopulations. The test developer must specify whether subpopulations need to be distinguished. And, if so, they need to define the subpopulations, and to provide criteria to include persons in subpopulations.

The conceptual framework

Test development starts with a definition or description of the construct that has to be measured by the test. But, the definition or description is usually not concrete enough to write test items.

A conceptual framework: gives the item writer a handle to write items.
In the literature, examples of conceptual frameworks are available.

Item response mode

The item response mode needs to be specified before item writing starts.
Dsitinction:

Free- vs constructed-response
Choice vs selected response

Free response items are further divided into:

Short-answer items
Essay items

Different types of choice modes are used in achievement and ability testing:

Conventional multiple-choice mode
Consists of a stem and two or more options. The options are divided into one correct answer and one or more distractors.
Usually, choosing the correct option of a multiple-choice item indicates that test takers’ ability or skill is sufficiently high to solve the item.
Distractors can be constructed to contain specific information on the reasons why the test taker failed to solve the item correctly. The choice of a distractor indicates which deficiency the test taker has and as such can be used for diagnosing specific deficiencies.
A dichotomous item response scale has two ordered categories. An answer is correct or incorrect.
An ordinal-polytomous scale has more than two ordered categories.
Partial ordinal-polytomous response scale: the correct option is ordered above the distractors, but the distractors are not ordered among themselves.

Administration mode

A test can be administered to test takers in different ways:

Oral
The test is presented orally by a single test administrator to a single test maker
Paper-and-pencil
The test is presented in the form of a booklet
Computerized
Test order of the items is the same for each of the test takers. It is presented on a computer.
Computerized adaptive test administration
The test is adaptive. The computer program searches for the items that best fit the test taker.

Item-writing guidelines

Focus on one relevant aspect
Each item should focus on a single relevant aspect of the specification in order to guarantee good coverage of the important aspects of the achievement or ability.
Only a single aspect of the specification needs to be measured to guarantee that test takers’ item responses are unambiguously interpretable.
Use independent item content
The content of different items is independent.
Testlet: a group of items that may be developed as a single unit that is meant to be administered together.
Avoid overly specific and overly general content
The disadvantage of overly specific item content is that the content may be trivial, and the disadvantage of overly general content is that the content may be ambiguous
Avoid items that deliberately deceive test takers
Keep vocabulary simple for the population of test takers
Put item options vertically
Minimize reading time and avoid unnecessary information
Use correct language
Use non-sensitive language
Use a clear stem and include the central idea in the stem
Word the item positively, and avoid negatives
Negative phrased items are harder to understand and may confuse test readers.
Use three options, unless it is easy to write plausible distractors
Use one option that is unambiguously the correct or best answer
Place the options in alphabetical, logical, or numerical order
Vary the location of the correct option across the test
Keep the options homogeneous in length, content, grammar, etc.
Avoid ‘all-of-the-above’ as the last option
Make distractors plausible
Avoid giving clues to the correct option

Item rating guidelines

The responses to free- (constructed-) response items have to be grated by raters.

Important guidelines:

Rate responses anonymously
Rate responses to one item at a time
Provide the rater with a frame of reference
Separate irrelevant aspects from the relevant performance
Use more than one rater
Re-rate the free responses
Rate all responses to an item on the same occasion
Rearrange the order of responses
Read a sample of responses

Pilot studies on item quality

Standard practice is that item writers produce a set of concept items and pilot studies are done to test the quality of these concept items.
Generally, at least half of the concept items do not survive the pilot studies, and items that survive are usually revised several times.

Expert’s and test takers’ pilot studies need to be done for both free-response and multiple-choice items.
For free-response items pilot studies need to be done on the ratings of test takers’ responses to the items.

Expert’s pilots

The concept items have to be reviewed before they are included in a test.
Items are reviewed on their content, technical aspects, and sensitivity.

The content and technical aspects are assessed by experts in both the field of the test and item writing.

Each of the concept items is discussed by a panel of experts.
A good start for the discussion of a multiple-choice item is to look for distractors that panel members could defend as (partly) correct answers.
The reviewing of the items yields qualitative information that is used to rewrite items or to remove concept items that cannot be repaired.
Revised items should be reviewed again by experts until further rewriting is not needed.

The sensitivity of items also needs to be reviewed.
Usually, the panel for the sensitivity review of the items consists of person not on the panel reviewing the content and technical aspects of the items.
The sensitivity review panel is composed of members of different groups.
The panel has to be trained to detect aspects of the items and the tests that may be sensitive to subpopulations.
The sensitivity review provides qualitative information that also could lead to rewriting or removal of concept items.

Test takers’ pilots

The concept items are individually administered to a small group test takers from the population of interest.
Each of the test takers is interviewed on their thinking while working on an item.

Two versions of the interview can be applied

Concurrent interview: the test taker is asked to think aloud while working on the item
Retrospective interview: the test taker is asked to recollect his or her thinking after completing the item.

Protocols of the interviews are made and the information is used to rewrite or remove concept items.

Compiling the first draft of the test

The concept items that survived the pilot studies are used to compile a concept version of the test that includes instructions for the test takers.
Usually, the instruction contains some example items that test takers have to answer to ensure that they understand the test instructions.
The concept test may consist of a number of subtest that measure different aspects of the ability or achievement.

The conventional way of assembling a maximum performance test is to start with easy items and to end with difficult items.

The concept test is submitted o a group of experts.
The group can be the same as the group that was used in the experts’ pilot study on item quality. The group has expertise in:

The content of the ability or achievement
Test construction

The experts evaluate two different properties of the concept test.

Whether the test instruction is sufficiently clear for the population of test takers
Whether the test yields adequate coverage of all aspects of the ability or achievement being measured by the test. (content validation)
Whether the test is balanced with respect to multicultural material and references to gender

The comments of the experts are used to compile the first draft of the test.
The first draft of the test is administered in a try-out to a sample of at least 200 test takers from the population of interest.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

This content is related to:

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Check more of topic:

Samenvattingen voor psychologie en gedrag

This content is used in:

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check the related and most recent topics and summaries:

Activities abroad, study fields and working areas:

...

easel-2714167__340.jpg

Introduction - a summary of chapter 1 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Observed test scores - a summary of chapter 4 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of item scores - a summary of chapter 6 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Test theory and practice

Lees verder over A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary
5017 keer gelezen

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: SanneA

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Construct of interest

Measurement mode

The objectives

The population

The conceptual framework

Item response mode

Administration mode

Item-writing guidelines

Item rating guidelines

Pilot studies on item quality

Compiling the first draft of the test

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Samenvattingen voor psychologie en gedrag

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Contributions: posts

Add new contribution

Spotlight: topics

Samenvattingen voor psychologie en gedrag

The Netherlands

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

easel-2714167__340.jpg

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance