Test theory and practice

In this bundle, the literature of the course test theory and practice is bundled.

Check supporting content in teasers:
Access: 
Public
Check more of this topic?
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Image

This content is also used in .....

A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary

Introduction - a summary of chapter 1 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Introduction - a summary of chapter 1 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 1
Introduction


Test definitions

Psychometric terminology sometimes differs depending on the types of test applications.

A psychological or educational test: an instrument for the measurement of a person’s maximum or typical performance under standardized conditions, where the performance is assumed to reflect one or more latent attributes.

  • A test is defined to be a measurement instrument. It is for measurement in the first place.
  • A test is defined to measure performance. Two types of performance:
    • Maximum performance tests ask the person to do his or her best to solve one or more problems. The answers to this problems can vary in correctness.
    • Typical performance tests asks the person to respond to one or more tasks where the responses are typical for the person. The person’s responses cannot be evaluated on correctness, but they typify the person.
  • Performance is measured under standardized conditions.
  • Test performance must reflect one or more latent attributes. The test performance is observable, but the latent attributes cannot be observed.

Tests are distinguished form surveys. It is not assumed that survey questions reflect a latent attribute.

Subtest: an independent part of a test.
A (sub)test consists of one or more items.
Item: the smallest possible subtest of a test. The building blocks of a test.
A test consists of n items, and is called a n-item test.

One or more latent attributes effect test performance.
The number of latent attributes is the dimensionality of the test.
Dimensionality: equal to the number of latent attributes (variables), which effects test performance.

Unidimensional test: a test that predominantly measures one latent attribute.
Multidimensional test: a test that measures more than one latent attribute.
Two-dimensional test: a test that measures two latent attributes. And so on…

Test types

Psychological and educational measurement instruments are divided into:

  • Mental test: consists of cognitive tasks
  • Physical test: consists of instruments to make somatic or physiological measurements

Maximum perfromance tests

A performance can be considered maximum in two different respects. If the performance is accurate and if the performance is fast.

Classified according to time:

  • Pure power test: consists of problems that the maker tries to solve. The test maker has ample time to work on each of the test items, even on the most difficult ones.
    Emphasis on measuring the accuracy to solve the problem.
  • Time-limited power tests: test are constructed so that the majority of test takers have enough time to solve the problems, and only a small minority needs more time.
  • Speed test: measures the speed taken to solve problems. Usually, the test consists
.....read more
Access: 
Public
Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Developing maximum performance tests - a summary of chapter 2 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 2
Developing maximum performance tests


Seven elements

  • Construct
  • Measurement mode
  • Objectives
  • Population and subpopulations
  • Conceptual framework
  • Respons mode
  • Administration mode

Construct of interest

The test developer must specify the latent variable of interest that has to be measured by the test.
Latent variable is a general term. The term construct is used when a subsantive interpretation is given of the latent variable.
The latent variable (construct) is assumed to effect test makers’ item responses and test scores.

Constructs can vary in many different ways.

  • Vary in content of mental abilities, psychomotor skills or physical abilities
  • Construct may vary in scope
    For example: from general intelligence to multiplication skill
  • Constructs vary from educational to psychological variables.

A good way to start a test development project is to define the construct that has to be measured by the test.
This definition describes the construct of interest, and distinguished it from other, related, constructs.
Usually, the literature on the construct needs to be studies before the definition can be given. Frequently the definition can only be given when other elements of the test development plan are specified.

Measurement mode

Different modes can be used to measure constructs.

  • Self-performance mode
    The test taker is is asked to perform a mental or physical task
  • Self-evaluation mode
    The test taker is asked to evaluate his or her ability to perform the task
  • Other-evaluation mode
    Ask others to evaluate a person’s ability to perform a task

The objectives

The test developer must specify the objectives of the test. Tests are used for many different purposes.

  • Scientific vs practical
  • Individual level vs groep level
  • Description (describe performances) vs diagnosis (adds a conclusion to a description) vs decision-making (decisions are based on tests)

The population

Target population: the set of persons to whom the test has to be applied.
The test developer must define the target population, and must provide criteria for the inclusion and exclusion of persons.
A target population can be split into distinct subpopulations. The test developer must specify whether subpopulations need to be distinguished. And, if so, they need to define the subpopulations, and to provide criteria

.....read more
Access: 
JoHo members
Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Typical performance tests - a summary of chapter 3 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 3
Typical performance tests


Typical performance tests assess behavior that is typical for the person.
These tests are used to measure attitudes, interests, values, opinions, and personality characteristics.

Construct of interest

The test developer has to specify the latent variable of interest that is assumed to effect test takers’ item responses and test scores.
The ususal constructs of interest of typical performance tests are:

  • Attitudes
  • Interests
  • Values
  • Opinions
  • Personality characteristics

The responses to typical performance tests are not evaluated on their correctness, but are considered to typify a person.

At the start of a test development project, the researcher needs information on the construct of interest. This information can be obtained from different sources

A study of the literature on the construct and existing measurement instruments is nearly always needed at the start of a test development project
Different types of research can be done on the construct.

  • Focus group method
    Uses small groups of persons who have experiential knowledge about the construct.
    A focus group meets with the test developer to talk about their experiences with the construct.
  • Key information method
    Uses persons who have expert knowledge about the construct of interest. The test developer interviews these key informants about the constructs.
  • Observation method

The test developer can use information from different sources to define the construct and, later on the test development process, he or she can use this information for item writing.

Measurement mode

  • Self-report mode
    The test taker answers questions on a typical performance construct
  • Other-report mode
    A person answers questions about another person’s construct
  • Somatic indicator mode
    Uses somatic signs to measure constructs
  • Physical trace mode
    Uses traces that are left behind to measure constructs

Each of these four modes can occur in tow different varieties

  • Reactive measurement mode
    When test takers can deliberately distort their construct value
  • Nonreactive measurement mode
    When test takers cannot distort their construct value

The reactive/nonreactive distinction is only used for typical performance measurements, and not for maximum performance measurements.
A maximum performance test asks test takers to do the best they can to perform the task.

Each of the four response modes can occur in two versions

Self-report mode
Test takers are asked to respond to questions or stimuli to assess their

.....read more
Access: 
JoHo members
Observed test scores - a summary of chapter 4 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Observed test scores - a summary of chapter 4 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 4
Observed test scores


The aim of testing is to yield scores of test takers’ maximum or typical performance.
Two main types of test scores are distinguished

  • Observed test
    Computed after the separate test items are scored.
    Derived from the item scores by taking the unweighted or weighted sum of the item scores.
    The latent variable is unobserved, and in general, the laten variable is not a simple sum of item scores.
  • Latent variable (construct) scores
    To compute the latent variable score, a model is needed that specifies the relation between the latent variable and item responses.
    The latent variable score is derived from the item responses under the assumption of a latent variable item response model.

Item scoring by fiat

Conventionally, items are scored by assigning ordinal numbers to the responses.
The scoring differs slightly between maximum and typical performance tests.

  • Maximum performance items are scored by assigning 0 to the lowest category, and consecutive rank numbers to subsequent categories.
  • Typical performance items are indicative or contra-indicative of the latent variable that is measured by the test, and the scoring of contra-indicative item has to be reversed with respect to the scoring of indicative items.
  • Dichotomous indicative typical performance items are scored assigning 0 to the ‘no’ (don’t agree), and 1 to the yes (agree) categorie.
    Whereas contra-indicative items are scored by assigning 0 to the ‘yes’, and 1 to the ‘no’ category.
  • The categories of ordinal-polytomous items are scored by assigning rank numbers to the categories.
  • Bounded-continuous items ares cored in measurement units, such as centimeters.

Measurement by fiat: the item scores are assigned to a test taker’s responses without any theoretical justification.
(for example, scores 0 and 1 are assigned to a correct and incorrect answer, ad the scores 1, - 5 are based on convention (by fiat) and are not based on psychometric theory)

The sum score

The score of the jth test taker on the kth item is indicated by Xjk. The conventional test score of the jth test taker on a n-item test is the unweighed sum of his (or her) item scores:

Usj = Xj1 + Xj2 +… + Xjn

It may be argued that items differ in imporance, and that they should be weighted differently.
The weighed sum score of the jth item on an n-item test is:

Wsj = w1Xj1 + w2Xj2 + … + wnXjn

w1 is the weight assinged to the first item and so on.
A problem with

.....read more
Access: 
JoHo members
Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 5
Classical analysis of observed test scores


Measured precision of observed test scores

Test scores are used in practical applications.

Measurement precision has two different aspects:

  • Information
    Applies to the test score of a single person
    The within-person aspect of measurement precision
  • Reliability
    Applies to a population of persons.
    The between-persons aspect of measurement precision

The concept of measurement precision applies to observed test scores as well as to latent variable scores.

Information on a single observed score

Functional thought experiment: fulfils a function within a theory.

True test score: the expected value of the observed test scores of the repeated test administrations in the thought experiment.
Test taker j’s true test score is the expected value of his (or her) independently distributed observed tst scores from (hypothetical) repeated administrations of the test to the test taker.

The observed test score is a variable that varies across repeated test administrations.
The true score is constant.

Error of measurement: the difference between test taker j’s observed test score and his (or her) true score.
Test taker j’s error of measurement on an arbitrary measurement occasion is ht difference between his (or her) observed test score and his (or her) true test score.
The expected value of the errors of measurement is 0.

The within-person error variance is an index for the precision of the measurement of a person’s true score.

Test taker j’s standard error of measurement: the square root of his (or her) within-person error variance.

Information: the reciprocal of a person’s within-person error variance.
A small amount of information means that Test taker j’s observed test scores vary widely around j’s true score across repeated test administrations.
A large amount of information means that j’s observed test scores do not vary widely around j’s true score.

Reliability of observed test scores in a population

Reliability: the differentiation of test scores of different test takers from a population.

Psychometrics uses two definitions of reliability

  • A theoretical definition
  • Operational definition.
    Yields procedures to assess reliability.

Reliability concerns the differentiation between the true test scores of different test takers from a population.
The differentiation is good if test taker’s true scores can be precisely predicted from their observed test

.....read more
Access: 
JoHo members
Classical analysis of item scores - a summary of chapter 6 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Classical analysis of item scores - a summary of chapter 6 of A conceptual introduction to psychometrics by G, J., Mellenbergh

Image

A conceptual introduction to psychometrics
Chapter 6
Classical analysis of item scores


The conventional way of scoring items is by assigning ordinal numbers to the response categories.
Usually, these item scores are ordered with respect to the attribute that the item is assumed to measure. But, these assignment of these ordinal numbers lacks a theoretical justification.

Usually, the analysis of test scores is supplemented by an analysis of the item scores.

Item score distributions

The scores of a given item have a distribution in a population of N persons.

  • Location: the place of the scale where item scores are centered
  • Dispersion: the scatter of the item scores
  • Shape: the form of the distributions

Classical item difficulty and attractiveness

The location of the item score distribution is used to define the classical item difficulty (maximum performance tests) and classical item attractiveness (typical performance tests) concepts.

  • Classical item difficulty: a parameter that indicates the location of the item score distribution in a population of persons.
  • Classical item attractiveness: a parameter that indicates the location of the item score distribution in a population of persons.

The two definitions are the same.

Classical item difficulty and attractiveness are defined in a population of persons.
Population-dependent and may differ between populations.

The mean in mainly used for this.
The mean of a dichotomously scored item is called the item p-value.

Item score variance and standard deviation

The most common parameters that are used in classical item score analysis are the variance and the standard deviation of the item scores.

Items that have a small item score variance, have little effect on the test score variance.

The variance of dichotomous item scores is a function of the item p-value.
For a given sample size, the variance has its maximum value at p=.5.

Classical item discrimination

Location and dispersion parameters yield useful information on the items of a test.
But, these parameters do not indicate the extent to which an item contributes to the aim of a test to assess individual differences in the attribute that is measured by the test.

Classical item discrimination: a parameter that indicates the extent to which the item differentiates between the true test scores of a population of persons.
Defined in a population of persons, may vary between different populations.

The item-test and item-rest correlations

An appropriate index for discrimination between the true scores would be the product moment correlation between the item score and the true score in the population of persons.
Test taker j’s observed

.....read more
Access: 
JoHo members
Test theory and practice

Year 1 of psychology at the uva

Introduction to psychology
Introduction to developmental psychology
Introduction to organisational psychology
Introduction to social psychology
Introduction to cognitive psychology
Introduction to clinical psychology
Test theory and practice
Samenvatting Statistics: The arts of learning from data (Agresti, 2009)

Samenvatting Statistics: The arts of learning from data (Agresti, 2009)


Inleiding

Voor veel psychologiestudenten is het vak statistiek beangstigend. Wat moet je met al die getallen en cijfertjes en hoe zitten alle formules in elkaar? Deze samenvatting zal zo goed mogelijk proberen statistiek op een duidelijke manier uit te leggen. Er zitten veel voorbeelden in die het wiskundige probleem proberen te verduidelijken. Deze samenvatting is echter geen wondermiddel. Het is vooral erg verstandig om veel te oefenen en aanwezig te zijn bij de colleges.

Voordat je aan deze samenvatting gaat beginnen toch enkele tips voor het werken met statistiek:

  • Wil niet altijd precies weten en begrijpen hoe een formule tot stand is gekomen. Soms is het van belang aan te nemen dat slimme wis- en natuurkundigen dit hebben verzonnen. Voor jullie is het belangrijk om er vooral mee te kunnen werken. Raak dus niet in paniek als je iets niet geheel snapt, maar probeer te kijken wat je wel begrijpt en probeer daarmee te werken.
  • Weet je een antwoord niet, sla de vraag dan tijdelijk over. Vaak komt het voor dat je tijdens het tentamen opeens op het antwoord komt of de manier waarop het berekend moet worden.
  • Het antwoord is altijd te vinden in de tekst; er wordt niet zomaar wat cijfers bij gegoocheld om het antwoord te vinden.
  • Lees dus goed, vaak wordt over belangrijke informatie heen gelezen.

Statistiek is dus niet om van in paniek te raken, maar zonder studeren wordt het wel erg moeilijk. Hopelijk maakt dit overzicht statistiek voor jullie wat duidelijker.

Succes!

 

Hoofdstuk 1 Data

Statistiek is de wetenschap die informatie uit verschillende studies en onderzoeken analyseert. Deze informatie wordt data genoemd. Op een objectieve manier worden onderzoeksvragen onderzocht en geanalyseerd. Na de analyse van data kunnen conclusies en voorspellingen gedaan worden.

De drie statistische processen die het meeste voorkomen:

  1. Design: het plannen en onderzoeken van een studie. Hierbij kan gedacht worden aan hoe relevante data verkregen moeten worden. Dit wordt meestal met behulp van samples (steekproeven) uit een populatie gedaan. Een populatie duidt niet op de gehele wereldbevolking. Het kan ook duiden op bijvoorbeeld alle scholen van Nederland of alle voetbalclubs in Noord-Holland.
  2. Descriptief: Het opsommen en uitvinden van patronen in een data sample. Dit wordt gedaan met behulp van grafieken en tabellen of op een beschrijvende manier, zoals het weergeven van gemiddelden en percentages.
  3. .....read more
Access: 
Public
More summaries
Follow the author: SanneA
More contributions of WorldSupporter author: SanneA:
Comments, Compliments & Kudos:

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
Image
The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad
Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

 

Check how to use summaries on WorldSupporter.org


Online access to all summaries, study notes en practice exams

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the menu above every page to go to one of the main starting pages
  3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
  5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study (main tags and taxonomy terms)

Field of study

Check related topics:
Activities abroad, studies and working fields
Countries and regions
Institutions and organizations
Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Statistics
2503