Testconstruction and testresearch - a summary of an article by Oosterveld & Vorst (2010)

Critical thinking
Article: Oosterveld & Vorst 2010
Testconstructie en testonderzoek

Validity-theory
Validity and measurement-quality of measurement-instruments

Validity-theory

There are problematic theories about validity

Examples van viewpoints

Dorsboom (2003)

According to Dorsboom, it is plausible that the mercury thermometer is a valid measurement of temperature of objects, because differences in the real temperature cause differences in the measurement-instrument.
If the causal string is described exactly, and this is a plausible representation of reality, than is the instrument valid in reality.
Real validity is unknown as long as not all the relevant knowledge is available.
Because it is in principle unknown in what extent relevant knowledge is available, validity is hypothetical unsure.
Even if the causal string between true variation in the trait and the measured variation is known well, knowledge about the causal strings can change due to new knowledge. This is why real validity is hypothetical.
However, people can have a judgment about the validity of measurement-instruments. This validity-judgment doesn’t have anything to do with the real validity.
In psychology, true causal strings are (yet) impossible
That is why psychology temporarily deals with hypothetical validity-judgments. This is in suspense of more precise and true causal strings between true trait-variation and measurement-variation.
The quality of measurement, not the validity, must be proven from psycho-metrical analysis (reliability, one-dimensionality, representative content of the measurement-instrument, connections with external criteria, support of theoretical expected connections)

Science-philosophical viewpoint

If a test is valid depends on the state of affairs in reality (ontology)

Description of validity

Validity: assumed property of trait varies in values in the population; differences in trait-values cause differences in measurement.
No validity if: differences in measurement-results can’t (be) explained by differences in trait (if traits don’t exist or no variance in values or no causal relation)

Derived statements

Validity is present or not
due to the knowledge of reality, the real validity of an instrument is hypothetical and for the time being.
the validity-judgment is a subjective estimate of the true validity of an instrument
validity can be assumed if causal relations in reality are applied in the construction of the measurement-instrument
Validity doesn’t have anything to do with relations between properties of criteria
validity is only about the measurement-instrument
distinction between forms of validity and forms of validity-research is pointless

Research to measurement-quality/validity

research to the causal relations between variance in properties and variance in measurement in central
existing validity-research is research to the quality of measurement
impression-validity is an superficial, subjective judgment of the measurement-quality
content-validity is a judgment about the measurement-quality of the content
criterium-validity: a research to the predicting value of the measurement
construct-validity: research to the theoretical measurement-validity

Messick

According to Messick, the temperature-measurement is valid because a lot of research with thermometers has given strong empirical support to theoretic expected relations between outcomes of temperature-measurements and other measurements of criteria.
According to Messick, validity of measurements has to become apparent from research: reliability, uni-dimensionality, and representative content of the measurement, relations to external criteria, support of theoretical expected relations in the nomological network.

Science-philosophical viewpoint

If a test-interpretation is valid depends on empirical support for the interpretation (epistemological)

Description of validity

Validity: the assumed trait or property varies in values in the population; differences in trait-values correlate with the differences in measurement.
No validity if: the measurement has no relation with other measurements or criteria

Derived statements

Validity is gradual
Because of the limited research of validity, the validity of an instrument is for the time being.
The judgment of validity is the validity
validity is a judgment on the base of empirical properties after the construction of the instrument
validity is dependent on empirical support of the nomological network
validity is about the interpretation of scores and decisions based on scores
forms of validity and forms of validity-research lead to insights in diverse aspects of validity

Research to measurement-quality/validity

Research to causal relations between variation in traits and variation in measurement is relevant, but a side issue
Diverse aspects of the measurement-qualities concern different aspects of the validity
impression-validity have a weak contribution to the validity of score-interpretation
content-validity has an important contribution to the validity of score-interpretation
criterium-validity has an important contribution to the validity of score-interpretation
construct-validity has an important contribution to the validity of score-interpretation

Science-philosophical viewpoint and description of validity

According to Borsboom, the requirement of validity of an measurement-process, is that the measured trait exists in reality, the values of the traits vary in the population, and that the variation is due to the measurement.
The requirements can exists only in reality, so validity is a judgment of the ontology.

Messick describes validity as a judgment of scientific research, validity is a judgment of the knowledge in which theory and empirical testing play a role.
That variation in trait-values causes variation in measurement-values is not required, but validity in trait-values must correlate with measurement-values.
If these correlations support the expected values, this supports the supposed trait, the variation of values in the population, and he validity of interpretations of measurement-results.

Summary

The concept of validity is relatively limited in Borsboom’s view. The measurement-instrument is, or is not valid. The quality of measuring is a more or less complex evaluation of diverse aspects.
The validity of most psychological instruments doesn’t exists because the causality of the trait-values to measurement-variance is (yet) unknown for all instruments.
Despite the fact that real validity of an instrument is negative or unsure, it can have good measurement-qualities.

The concept of validity is extensive in Messick’s view. Measurement-results and possibilities for users are valid to a certain extent. The measurement is included in the concept of validity.
Measurement-results contribute to a combined validity-judgment about interpretations of measurement-results.

Validity and measurement-quality of measurement-instruments

Impression-validity – a subjective judgement of measurement-quality

Impression-validity: a subjective judgment of the usability of an measurement-instrument on the base of directly observable properties of the testing-material.

This concerns the judgment of test-takers and other people without certain knowledge, but it can also concern test-users without knowledge of the manual of the measurement-instrument.
The impression can be formed by the exterior quality of the test-material, asking for the nature of the questions, the nature of the answer-possibilities, etcetera.
The impression-validity can be measured by test-takers and users to get a subjective judgement of the usability of the measurement-instrument based on exterior test-material.
A high subjective value can help usability of the measurement-instrument.

Content-validity – the substantive measurement-quality

Content-validity: the judgement of the representativeness of observations, exercises and questions for a certain goal.

assumed is that the measuring-goal includes a domain of observations, exercises, or questions and that items are an a-select, representative sample. This can also concern the difficulty-level of items or the nature of items.
it can be determined by offering items of the measuring-instrument to potential respondents or expects in domain-descriptions, and telling them to sort items on domain-descriptions.
With high agreement between items and domain-descriptions within judges, the content-validity is high.
This is especially important for tests and exams.

Criterium-validity – predicting value of measurement

Criterium-value: the (cor)relation between test-score and a psychological or social criterium.

The psychological criterium can be a psychological or medical judgment.
The criterium-value can be established in the present (simultaneous validity), in the past (postdiction) or in the future (prediction)
This validity can be established by researching test-score and criterium-score. The results can be shown in expectancy-tables, prediction-tables, or prediction-figures.

Process-validity – procedural measurement-quality

Process-validity: the manner in which the response of the test-taker is established.

an observation, exercise, or questions has a high process-validity if the behaviour the test-constructor wanted is performed by the respondent.
this validity can be established with thinking-out-loud protocols or with experiments with instructions.

Construct-validity – theoretical measurement-quality

Construct-validity: the judgement of the similarities between the hypothetical relations of the construct and other constructs, and the empirical showed relations between instruments that measure the constructs.

the expected relations must be similar to high empirical correlations between instruments that measure the construct.
the nomological network: the system of hypothetical relations around the construct.
This can be a part of the theory.

Homogeneity or consistency-reliability

The homogeneity or consistency-reliability: the cohesion between the different (items) of a scale.
With psychological measurement, it is assumed the the items are repeated, independent measures of a trait.

The height of homogeneity-indices depends on the height of the interitemcorrelatoins and the number of items.

Generalizability

The generalizability of measurement-instruments can concern persons, circumstances and goals.
Groups of people may differ in such an extent that you are concerned that the same instrument will give different results for other groups of people.
Measurement-instruments can be given in such different circumstances that one instrument measures different properties.
The goal of measurement can be influential on the measured traits.

The validity and reliability/measurement-quality is principally independent of population and sample.

For each group of people that differ in one or more aspects, the validity and reliability must be established independently.

Borsboom: if the regularities on which the instrument is constructed are universal, the validity of the instrument is generalizable over groups.

Messick: measurement-quality or validity must be repeated for each group. It must be researched if the measurement-model and structure-model of relations between measurements is equal for all groups.

Access:

Public

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant tolerant and sustainable world. Through physical and online platforms, it support personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.