Test theory and practice | WorldSupporter Summaries and Study Notes

Test theory and practice

In this bundle, the literature of the course test theory and practice is bundled.

Check supporting content in teasers:

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

This is a summary of the book A conceptual introduction to psychometrics by G, J., Mellenbergh. The summary contains chapter 1 to 6, and focusus on developing psychological tests.

The first chapter of this summary is for free, but to support worldsupporter and Joho, you have to become a Joho-member to read the other chapters. This is five euro a year, and you then can read all Joho-member content

Introduction - a summary of chapter 1 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Observed test scores - a summary of chapter 4 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of item scores - a summary of chapter 6 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Test theory and practice

Year 1 of psychology at the uva

In this bundle, all summaries needed to succesfully complete the fist year of psychology at the uva are presented.

Introduction to psychology

Introduction to developmental psychology

Introduction to organisational psychology

Introduction to social psychology

Introduction to cognitive psychology

Introduction to clinical psychology

Test theory and practice

Samenvatting Statistics: The arts of learning from data (Agresti, 2009)

Psychology at the uva

More summaries

Read more about Year 1 of psychology at the uva
1990 reads

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access and support of WordSupporter

Check more of this topic?

Psychologie en gedrag

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Search other summaries?

Associate with your Field of Study

Search Summaries or Notes

Start using Summaries

Add a Summary

This content is also used in .....

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Introduction - a summary of chapter 1 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 1
Introduction

Test definitions
Test types

Test definitions

Psychometric terminology sometimes differs depending on the types of test applications.

A psychological or educational test: an instrument for the measurement of a person’s maximum or typical performance under standardized conditions, where the performance is assumed to reflect one or more latent attributes.

A test is defined to be a measurement instrument. It is for measurement in the first place.
A test is defined to measure performance. Two types of performance:
- Maximum performance tests ask the person to do his or her best to solve one or more problems. The answers to this problems can vary in correctness.
- Typical performance tests asks the person to respond to one or more tasks where the responses are typical for the person. The person’s responses cannot be evaluated on correctness, but they typify the person.
Performance is measured under standardized conditions.
Test performance must reflect one or more latent attributes. The test performance is observable, but the latent attributes cannot be observed.

Tests are distinguished form surveys. It is not assumed that survey questions reflect a latent attribute.

Subtest: an independent part of a test.
A (sub)test consists of one or more items.
Item: the smallest possible subtest of a test. The building blocks of a test.
A test consists of n items, and is called a n-item test.

One or more latent attributes effect test performance.
The number of latent attributes is the dimensionality of the test.
Dimensionality: equal to the number of latent attributes (variables), which effects test performance.

Unidimensional test: a test that predominantly measures one latent attribute.
Multidimensional test: a test that measures more than one latent attribute.
Two-dimensional test: a test that measures two latent attributes. And so on…

Test types

Psychological and educational measurement instruments are divided into:

Mental test: consists of cognitive tasks
Physical test: consists of instruments to make somatic or physiological measurements

Maximum perfromance tests

A performance can be considered maximum in two different respects. If the performance is accurate and if the performance is fast.

Classified according to time:

Pure power test: consists of problems that the maker tries to solve. The test maker has ample time to work on each of the test items, even on the most difficult ones.
Emphasis on measuring the accuracy to solve the problem.
Time-limited power tests: test are constructed so that the majority of test takers have enough time to solve the problems, and only a small minority needs more time.
Speed test: measures the speed taken to solve problems. Usually, the test consists

Access:

Public

Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 2
Developing maximum performance tests

Construct of interest
Measurement mode
The objectives
The population
The conceptual framework
Item response mode
Administration mode
Item-writing guidelines
Item rating guidelines
Pilot studies on item quality
Compiling the first draft of the test

Seven elements

Construct
Measurement mode
Objectives
Population and subpopulations
Conceptual framework
Respons mode
Administration mode

Construct of interest

The test developer must specify the latent variable of interest that has to be measured by the test.
Latent variable is a general term. The term construct is used when a subsantive interpretation is given of the latent variable.
The latent variable (construct) is assumed to effect test makers’ item responses and test scores.

Constructs can vary in many different ways.

Vary in content of mental abilities, psychomotor skills or physical abilities
Construct may vary in scope
For example: from general intelligence to multiplication skill
Constructs vary from educational to psychological variables.

A good way to start a test development project is to define the construct that has to be measured by the test.
This definition describes the construct of interest, and distinguished it from other, related, constructs.
Usually, the literature on the construct needs to be studies before the definition can be given. Frequently the definition can only be given when other elements of the test development plan are specified.

Measurement mode

Different modes can be used to measure constructs.

Self-performance mode
The test taker is is asked to perform a mental or physical task
Self-evaluation mode
The test taker is asked to evaluate his or her ability to perform the task
Other-evaluation mode
Ask others to evaluate a person’s ability to perform a task

The objectives

The test developer must specify the objectives of the test. Tests are used for many different purposes.

Scientific vs practical
Individual level vs groep level
Description (describe performances) vs diagnosis (adds a conclusion to a description) vs decision-making (decisions are based on tests)

The population

Target population: the set of persons to whom the test has to be applied.
The test developer must define the target population, and must provide criteria for the inclusion and exclusion of persons.
A target population can be split into distinct subpopulations. The test developer must specify whether subpopulations need to be distinguished. And, if so, they need to define the subpopulations, and to provide criteria

Access:

JoHo members

Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 3
Typical performance tests

Construct of interest
Measurement mode
The objectives
Population
The conceptual framework
Item response mode
Administration mode
Item writing guidelines
Item rating guidelines
Pilot studies on item quality
Response tendencies
Compiling the first draft of the test

Typical performance tests assess behavior that is typical for the person.
These tests are used to measure attitudes, interests, values, opinions, and personality characteristics.

Construct of interest

The test developer has to specify the latent variable of interest that is assumed to effect test takers’ item responses and test scores.
The ususal constructs of interest of typical performance tests are:

Attitudes
Interests
Values
Opinions
Personality characteristics

The responses to typical performance tests are not evaluated on their correctness, but are considered to typify a person.

At the start of a test development project, the researcher needs information on the construct of interest. This information can be obtained from different sources

A study of the literature on the construct and existing measurement instruments is nearly always needed at the start of a test development project
Different types of research can be done on the construct.

Focus group method
Uses small groups of persons who have experiential knowledge about the construct.
A focus group meets with the test developer to talk about their experiences with the construct.
Key information method
Uses persons who have expert knowledge about the construct of interest. The test developer interviews these key informants about the constructs.
Observation method

The test developer can use information from different sources to define the construct and, later on the test development process, he or she can use this information for item writing.

Measurement mode

Self-report mode
The test taker answers questions on a typical performance construct
Other-report mode
A person answers questions about another person’s construct
Somatic indicator mode
Uses somatic signs to measure constructs
Physical trace mode
Uses traces that are left behind to measure constructs

Each of these four modes can occur in tow different varieties

Reactive measurement mode
When test takers can deliberately distort their construct value
Nonreactive measurement mode
When test takers cannot distort their construct value

The reactive/nonreactive distinction is only used for typical performance measurements, and not for maximum performance measurements.
A maximum performance test asks test takers to do the best they can to perform the task.

Each of the four response modes can occur in two versions

Self-report mode
Test takers are asked to respond to questions or stimuli to assess their

Access:

JoHo members

Observed test scores - a summary of chapter 4 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 4
Observed test scores

Item scoring by fiat
The sum score
The observed test score distribution

The aim of testing is to yield scores of test takers’ maximum or typical performance.
Two main types of test scores are distinguished

Observed test
Computed after the separate test items are scored.
Derived from the item scores by taking the unweighted or weighted sum of the item scores.
The latent variable is unobserved, and in general, the laten variable is not a simple sum of item scores.
Latent variable (construct) scores
To compute the latent variable score, a model is needed that specifies the relation between the latent variable and item responses.
The latent variable score is derived from the item responses under the assumption of a latent variable item response model.

Item scoring by fiat

Conventionally, items are scored by assigning ordinal numbers to the responses.
The scoring differs slightly between maximum and typical performance tests.

Maximum performance items are scored by assigning 0 to the lowest category, and consecutive rank numbers to subsequent categories.
Typical performance items are indicative or contra-indicative of the latent variable that is measured by the test, and the scoring of contra-indicative item has to be reversed with respect to the scoring of indicative items.
Dichotomous indicative typical performance items are scored assigning 0 to the ‘no’ (don’t agree), and 1 to the yes (agree) categorie.
Whereas contra-indicative items are scored by assigning 0 to the ‘yes’, and 1 to the ‘no’ category.
The categories of ordinal-polytomous items are scored by assigning rank numbers to the categories.
Bounded-continuous items ares cored in measurement units, such as centimeters.

Measurement by fiat: the item scores are assigned to a test taker’s responses without any theoretical justification.
(for example, scores 0 and 1 are assigned to a correct and incorrect answer, ad the scores 1, - 5 are based on convention (by fiat) and are not based on psychometric theory)

The sum score

The score of the j^th test taker on the k^thitem is indicated by X_jk. The conventional test score of the j^th test taker on a n-item test is the unweighed sum of his (or her) item scores:

Us_j = X_j1 + X_j2 +… + X_jn

It may be argued that items differ in imporance, and that they should be weighted differently.
The weighed sum score of the j^thitem on an n-item test is:

Ws_j = w₁X_j1 + w₂X_j2 + … + w_nX_jn

w₁ is the weight assinged to the first item and so on.
A problem with

Access:

JoHo members

Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 5
Classical analysis of observed test scores

Measured precision of observed test scores
Information on a single observed score
Reliability of observed test scores in a population
Some properties of classical test theory
Parameter estimation

Measured precision of observed test scores

Test scores are used in practical applications.

Measurement precision has two different aspects:

Information
Applies to the test score of a single person
The within-person aspect of measurement precision
Reliability
Applies to a population of persons.
The between-persons aspect of measurement precision

The concept of measurement precision applies to observed test scores as well as to latent variable scores.

Information on a single observed score

Functional thought experiment: fulfils a function within a theory.

True test score: the expected value of the observed test scores of the repeated test administrations in the thought experiment.
Test taker j’s true test score is the expected value of his (or her) independently distributed observed tst scores from (hypothetical) repeated administrations of the test to the test taker.

The observed test score is a variable that varies across repeated test administrations.
The true score is constant.

Error of measurement: the difference between test taker j’s observed test score and his (or her) true score.
Test taker j’s error of measurement on an arbitrary measurement occasion is ht difference between his (or her) observed test score and his (or her) true test score.
The expected value of the errors of measurement is 0.

The within-person error variance is an index for the precision of the measurement of a person’s true score.

Test taker j’s standard error of measurement: the square root of his (or her) within-person error variance.

Information: the reciprocal of a person’s within-person error variance.
A small amount of information means that Test taker j’s observed test scores vary widely around j’s true score across repeated test administrations.
A large amount of information means that j’s observed test scores do not vary widely around j’s true score.

Reliability of observed test scores in a population

Reliability: the differentiation of test scores of different test takers from a population.

Psychometrics uses two definitions of reliability

A theoretical definition
Operational definition.
Yields procedures to assess reliability.

Reliability concerns the differentiation between the true test scores of different test takers from a population.
The differentiation is good if test taker’s true scores can be precisely predicted from their observed test

Access:

JoHo members

Classical analysis of item scores - a summary of chapter 6 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 6
Classical analysis of item scores

Item score distributions
Classical item discrimination
Distractor analysis
The internal structure of the test

The conventional way of scoring items is by assigning ordinal numbers to the response categories.
Usually, these item scores are ordered with respect to the attribute that the item is assumed to measure. But, these assignment of these ordinal numbers lacks a theoretical justification.

Usually, the analysis of test scores is supplemented by an analysis of the item scores.

Item score distributions

The scores of a given item have a distribution in a population of N persons.

Location: the place of the scale where item scores are centered
Dispersion: the scatter of the item scores
Shape: the form of the distributions

Classical item difficulty and attractiveness

The location of the item score distribution is used to define the classical item difficulty (maximum performance tests) and classical item attractiveness (typical performance tests) concepts.

Classical item difficulty: a parameter that indicates the location of the item score distribution in a population of persons.
Classical item attractiveness: a parameter that indicates the location of the item score distribution in a population of persons.

The two definitions are the same.

Classical item difficulty and attractiveness are defined in a population of persons.
Population-dependent and may differ between populations.

The mean in mainly used for this.
The mean of a dichotomously scored item is called the item p-value.

Item score variance and standard deviation

The most common parameters that are used in classical item score analysis are the variance and the standard deviation of the item scores.

Items that have a small item score variance, have little effect on the test score variance.

The variance of dichotomous item scores is a function of the item p-value.
For a given sample size, the variance has its maximum value at p=.5.

Classical item discrimination

Location and dispersion parameters yield useful information on the items of a test.
But, these parameters do not indicate the extent to which an item contributes to the aim of a test to assess individual differences in the attribute that is measured by the test.

Classical item discrimination: a parameter that indicates the extent to which the item differentiates between the true test scores of a population of persons.
Defined in a population of persons, may vary between different populations.

The item-test and item-rest correlations

An appropriate index for discrimination between the true scores would be the product moment correlation between the item score and the true score in the population of persons.
Test taker j’s observed

Access:

JoHo members

Test theory and practice

In this bundle, the literature of the course test theory and practice is bundled.

Supporting content:

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Year 1 of psychology at the uva

Access:

Public

Year 1 of psychology at the uva

Introduction to psychology

In this magazine, you can find everything you need for the course 'introduction to psychology', in the first year of the study psychology at the Uva.

Supporting content:

Psychology by Gray and Bjorklund (7th edition) - a summary

Year 1 of psychology at the uva

Access:

Public

Introduction to developmental psychology

In this magazine you can find everything you need in the course developmental psychology at the first year of psychology at the Uva.

Supporting content:

An Introduction to Developmental psychology by A. Slater and G. Bremner (third edition) - a summary

Year 1 of psychology at the uva

Access:

Public

Introduction to organisational psychology

In this bundle, summaries are bundled concerning the first year psychology course organisational psychology.

Supporting content:

Organizational Behavior by Mcshane, S. (8th edition) a summary

Year 1 of psychology at the uva

Access:

Public

Introduction to social psychology

In this magazine, all summaries needed for the first year psychology course Social psychology are bundled.

Supporting content:

Social Psychology by Smith, E, R (fourth edition) a summary

Year 1 of psychology at the uva

Access:

Public

Introduction to cognitive psychology

In this magazine, the useful summary for the first year psychology course cognitive psychology is bundled.

Supporting content:

Cognitive Psychology by Gilhooly, K & Lyddy, F, M (first edition) - a summary

Year 1 of psychology at the uva

Access:

Public

Introduction to clinical psychology

In this magazine, you can find the information you need for the first year course introduction to clinical psychology in the study psychology at the uva.

Supporting content:

Abnormal Psychology by Kring, Davison, Neale & Johnson (12th edition) - a summary

Year 1 of psychology at the uva

Access:

Public

Test theory and practice

In this bundle, the literature of the course test theory and practice is bundled.

Supporting content:

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Year 1 of psychology at the uva

Access:

Public

Samenvatting Statistics: The arts of learning from data (Agresti, 2009)

Hoofdstuk 1 Data
Hoofdstuk 2 Het onderzoeken van datagegevens
Hoofdstuk 3 Associaties tussen twee variabelen
Hoofdstuk 4 Data verzamelen
Hoofdstuk 5 Kansen
Hoofdstuk 6 Opsomming van kansen
Hoofdstuk 7 Betrouwbaarheidsinterval
Hoofdstuk 8 Hypothese toetsen
Hoofdstuk 9 Groepen vergelijken
Hoofdstuk 10 Associatie tussen categorische variabelen
Hoofdstuk 11 Associatie tussen kwantitatieve variabelen: de regressieanalyse
Hoofdstuk 12 Multipele regressie
Hoofdstuk 13 ANOVA: groepen vergelijken
Hoofdstuk 14 Niet parametrische statistiek

Inleiding

Voor veel psychologiestudenten is het vak statistiek beangstigend. Wat moet je met al die getallen en cijfertjes en hoe zitten alle formules in elkaar? Deze samenvatting zal zo goed mogelijk proberen statistiek op een duidelijke manier uit te leggen. Er zitten veel voorbeelden in die het wiskundige probleem proberen te verduidelijken. Deze samenvatting is echter geen wondermiddel. Het is vooral erg verstandig om veel te oefenen en aanwezig te zijn bij de colleges.

Voordat je aan deze samenvatting gaat beginnen toch enkele tips voor het werken met statistiek:

Wil niet altijd precies weten en begrijpen hoe een formule tot stand is gekomen. Soms is het van belang aan te nemen dat slimme wis- en natuurkundigen dit hebben verzonnen. Voor jullie is het belangrijk om er vooral mee te kunnen werken. Raak dus niet in paniek als je iets niet geheel snapt, maar probeer te kijken wat je wel begrijpt en probeer daarmee te werken.
Weet je een antwoord niet, sla de vraag dan tijdelijk over. Vaak komt het voor dat je tijdens het tentamen opeens op het antwoord komt of de manier waarop het berekend moet worden.
Het antwoord is altijd te vinden in de tekst; er wordt niet zomaar wat cijfers bij gegoocheld om het antwoord te vinden.
Lees dus goed, vaak wordt over belangrijke informatie heen gelezen.

Statistiek is dus niet om van in paniek te raken, maar zonder studeren wordt het wel erg moeilijk. Hopelijk maakt dit overzicht statistiek voor jullie wat duidelijker.

Succes!

Hoofdstuk 1 Data

Statistiek is de wetenschap die informatie uit verschillende studies en onderzoeken analyseert. Deze informatie wordt data genoemd. Op een objectieve manier worden onderzoeksvragen onderzocht en geanalyseerd. Na de analyse van data kunnen conclusies en voorspellingen gedaan worden.

De drie statistische processen die het meeste voorkomen:

Design: het plannen en onderzoeken van een studie. Hierbij kan gedacht worden aan hoe relevante data verkregen moeten worden. Dit wordt meestal met behulp van samples (steekproeven) uit een populatie gedaan. Een populatie duidt niet op de gehele wereldbevolking. Het kan ook duiden op bijvoorbeeld alle scholen van Nederland of alle voetbalclubs in Noord-Holland.
Descriptief: Het opsommen en uitvinden van patronen in een data sample. Dit wordt gedaan met behulp van grafieken en tabellen of op een beschrijvende manier, zoals het weergeven van gemiddelden en percentages.

Access:

Public

Psychology at the uva

This is a bundle with all the summaries you need for the study of psychology at the Uva.

Year 1 of psychology at the uva

Year 2 of psychology at the uva

Year 3 of psychology at the uva

Master klinische psychologie uva

Living a student’s-life

More summaries

For more summaries in the third year of psychology, go to Uva Psychology Start Magazine

Follow the author: SanneA

SanneA

More contributions of WorldSupporter author: SanneA:

Comments, Compliments & Kudos:

Add new contribution

Promotions

The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad

Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
Quicklinks to fields of study (main tags and taxonomy terms)

Online access to all summaries, study notes en practice exams

Check out: Register with JoHo WorldSupporter: starting page (EN)
Check out: Aanmelden bij JoHo WorldSupporter - startpagina (NL)

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
Use the menu above every page to go to one of the main starting pages
Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Check out: Why and how to add a WorldSupporter contributions
JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
Non-members: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form

Quicklinks to fields of study (main tags and taxonomy terms)

Field of study

Check related topics:

Activities abroad, studies and working fields

Public
WorldSupporters only
JoHo members
Private

Statistics

2503