Psychological measurement-instruments - a summary for WSRt -of an article by Oostervel & Vorst (2010)

Critical thinking
Article: Oostervel & Vorst (2010)
Psychological measurement-instruments

The construction of measurement-instrument is an important subject.

  • certain instruments age because theories about human behaviour or because social changes tear down existing instruments
  • new instruments can be necessary because existing instruments aren’t sufficient enough.
  • new instruments can be necessary because existing instruments aren’t suitable for an certain target group.

Measurement preferences

Measurement preferences of an instrument: the goal of an measurement-instrument.
This is about a more or less hypothetical property.

The domain of human acting

The instrument is usually focussed on measuring an property in a global domain of human acting.
A domain: a wide area of more or less coherent properties.

Observation methods

Every measurement-instrument uses one or more observation methods. For different properties of different domains, usually different observation methods are used.

  • performance-tests
  • questionnaires
  • observation tests

When properties are measured with different observation methods, it is logical that with different methods, different domains of the traits or categories are measured.

Instruments based on one observation method seem to form a common method-factor, which usually is stronger than the common trait-factor of equal traits measured with different observation methods.

Theory

The development of an instrument is usually based on an elaborated theory or insights based on empirical research or ideas based on informal knowledge.
Instruments developed on the base of formal knowledge and an elaborated theory are of better quality than instruments based on informal knowledge and an poorly formulated theory.

Construct

An instrument forms the elaboration of an construct that refers to an combination of properties.
Measurement instruments for specific (latent) traits are of better quality than instruments for global traits or composite traits.

Structure

The structure of an test depends on the properties it measures.

Unstructured observation-methods are the measurement-conditions that aren’t standardized and because of that it’s measurement-results are difficult to compare to other persons and situations. Objective scores are difficult to obtain.

Application possibilities

The application possibilities of an measurement-instrument the researcher wants to achieve can be related to theoretical or describing research.
It is about analysis of an great number of observations.

For individual applications high requirements are placed on realised measurement-preferences.

Costs

An often decisive element in the description of the measurement-preferences of an measurement-instrument are the costs of that instrument.

Dimensionality

An instrument consists of one or more measurement-scales or sub-tests.
More scales refer to more dimensions of the construct and a subdivision in more latent traits or latent categories.

An instrument that is based on a specific latent trait must be one-dimensional.

Reliability

Three kinds of reliability:

  • Internal consistence-reliability
    Mutual cohesion of items that form a scale or sub-tests.
  • Repeated reliability
    Repeated measures with the same instrument
  • Local reliability
    an impression of the reliability of the measurement within a certain wide of scores.

Validity

Does the test measure what it is supposed to measure?

Forms of validity:

  • Impression-validity
  • content-validity
  • criterium-validity
  • process-validity
  • construct-validity

Utility

Utility of an instrument: the use of an instrument as becomes apparent from a costs-bate analysis.

Standardization

A psychological measurement-instrument doesn’t lead to absolute results, but to relative ones. The individual scores must be compared to scores of others.
The scores of others form the norm.
Norm-group: the group of people that forms the norm
Norming exists of the calculation of rough score to relative norm-score.

Validity and measurement-quality

Validity and measurement-quality of measurement-instruments

Impression-validity – an subjective judgment of the measurement-quality

Impression-validity: an subjective judgment of the usability of an measurement-instrument on the base of the direct observable properties of the test-material.
The judgement of test-takers and other laics.

Content-validity - content measurement-quality

Content-validity: the judgment about the representativeness of the observations, appointments, and questions for a certain purpose.
This can be determined by offering potential respondents or experts domain-descriptions and the items of the instrument, and then order them to sort items on domain-descriptions.

  • With big conformity between items and domain-descriptions on judges, the content-validity is high.

Especially important for tests and exams.

Criterium-validity – predicting value of the measurement

Criterium-validity: the (cor)relation between test-score and a psychological or social criterium.
Can be found by researching test-score and criterium-score.

Process-validity: procedural measurement-quality

Process-validity: the manner on which the response is established.
Can be researched with thinking-out-loud protocols or experiments with instructions.

Construct-validity – theoretical measurement-quality

A part of the similarities between the strictly formulated, hypothetical relations between the measured construct, and other constructs and otherwise empirical proved relations between instruments which should measure those constructs.

Convergent validity: if measurement-results from different instruments that research the same construct are coherent or highly correlated.
Divergent validity: if measurement-results from different instruments that test different constructs have a low correlation.

Homogeneity of consistence-reliability

The coherence between separate indicators (items) in a scale.
By a psychological scale, assumed is that the items of which the scale is composed are independent, repeated measured of the same trait.

Homogeneity is determined with different indices:

  • mean inter item correlations
  • split-halves reliability
  • coefficient alpha

The height of homogeneity-indices is usually dependent on the height of the inter item correlations of the number of items.

Generalizability of the measurement-quality

The validity and reliability/measurement-quality is in principle dependent on the population or sample.
For every group of persons who differ on one or more characteristics the validity and reliability/ measurement-quality of an instrument must be determined separately.

Paradoxes and measurement-qualities

Subjective judgments of the measurement-quality

The unarmed judgment about the measurement-quality of a test can be deceiving and doesn’t have to have a relation to the researched measurement-quality.

Content validity

The content validity of an instrument turns out the representative choice of items out of one or more domains of items.
If the content of an instrument is chosen optimally, this can lead to a less homogeneous instrument.

  • This property is measured with items that are diverse of content.

These items elicit a great diversity of responses, which do not lead to a homogeneous one-dimensional scale or sub-test.
If the constructor also wants a high homogeneity or predictive value of the instrument, this will be at expense of the content representativenes of the instrument.

Predictive value

The quality of an one-dimensional measurement-model requires homogeneous responses on an limited number repeated measurements. That requires homogeneous items.

The consistency-reliability of homogeneous items is lower than that of heterogeneous items. The predictive value of homogeneous items is lower than that of heterogeneous items.

With heterogeneous items one can’t usually form a scale with good measurement-properties, but one can form a predictor.
With homogeneous items, forming a scale with good measurement-properties is possible, but forming a predictor isn’t.

Theoretical measurement-quality

Items must be homogeneous per scale in order to meet the requirements of an one-dimensional measurement-model.

  • The items of a scale must correlate with (the items of) scales that measure similar properties (convergent relations) and may not correlate highly with (items of) scales that measure dissimilar properties (divergent relations)
  • Items may not correlate highly with indications of response tendencies

If items meet these criteria, predictive value and content quality of the instrument can’t be optimal

Homogeneity

Homogeneity of a scale is based on the assumption that items of a scale or sub-test form independent, repeated measures of a property.
These repeated measurements must be mutually coherent to form a high consistence-reliability or homogeneity of the scale or sub-test.
Homogeneous scales or sub-tests threaten the maximal content representativeness of the property and the predictive value of the measurement.

Selection of measurement-quality

A test-constructor can’t maximize all the measurement-qualities in one measurement-instrument, and the test-taker shouldn’t expect all the measurement-qualities in an instrument.

If a constructor hadn’t chosen an measurement-quality to maximize beforehand, the instrument will have random measurement-qualities.
The constructor can attain the best results by focusing on one measurement-quality during test-construction.

Optimization, probability capitalization, and cross-validation

Most methods of test-construction have an empirical character in which the constructor attains the best result using optimizing choice-procedures, optimizing solution-strategies, or optimizing analyzing-strategies.
There, coincidence can play a big role
By making optimal choices, or using optimal techniques, the constructor can stack coincidence on coincidence. This is probability capitalization

The constructor can gain insight in the effects of probability capitalization by using cross-validation on the results.
With cross-validation, results of an optimizing strategy become more apparent, and because of this more certainty can be obtained about the (in)stability of the results.

Optimizing procedures: examples

  • selection of items on the basis of optimal psychometric properties
  • selection of items on the basis of differences between groups
  • selection of optimal weights for item-scores and test-scores

Probability capitalization

Typical for optimizing procedures is their empirical character and that they lead to optimizing choices with empirical data.
There are no theoretical or hypothetical considerations that adjust the selection process.

The data on which selection is based is commonly unreliable to a certain extent.

Probability capitalization means that with an optimizing strategy, the choice is make in part because of chance.
By such a choice, no distinction can be made between true variance and false variance or coincidence.

Probability capitalization doesn’t exists when there is only selected on the base of true differences/ true correlations but (also) when there is selected on accidentally big differences/big correlations.
Coincidence here is not systematical or repeatable.

Sorts of optimizing procedures and techniques

Three common forms of optimizing:

  • optimizing of psychometric characteristics of measurement and/or prediction through selection of items.
  • optimizing of differences between mean scores of groups by selection of items
  • optimizing of quality of measuring of accuracy of predicting by giving weights to item-scores or test-scores

These procedures are often executed with optimizing, exploratory analyzing-techniques
The most common techniques are:

  • Exploratory factor-analysis
  • Cluster-analysis
  • multiple regression-analysis
  • discrimination-analysis

Measurement-model: about what the constructor wants to measure
Structure-model: about what the constructor wants to predict

Cross-validation: control of instability of outcomes

The central idea of cross-validation: repeatedly test/calculate optimal indices

  • This divides the research-sample in two comparable groups.
  • Out of every sub-group, two new sub-groups (A and B) are at random composed. The sub-groups then merged to two groups, A and B.
  • Then, exploratory analysis is done twice.
  • outcomes of the analysis are compared

If the optimizing technique or procedure brings similar results in both analysis, than probability capitalization is present is such a small degree that outcomes are stable or reliable.

Threats to validity of measures

Threats with observation-methods:

  • Disruptive influence of the presence and behavior of the observant on the observed person and his or her behavior.
  • Expectancy-effect
    distorted effect of the expectations of the observant on the observations
  • adjustment-effect
    changes in the manner of observation in the course of time
  • Category-effect
    loss of precision due to use of global categories
  • order-effect
    distorted influence of first and last observations on other observations of the series
  • effect of under-performance
    distortion of observations due to under-representation of common behavior or events
  • effect of event rate
    distortion due to missing observations as a result of the rate of events or behaviors
  • effect of event-complexity
    distortion due to missing observations as a result of the complexity of behaviors or events

Threats of rating-methods

  • Halo-effect
    a positive distortion on specific traits as a result of a positive first or general impression
  • horn-effect
    a negative distortion on specific traits as a result of a negative first or general impression
  • regression to the middle
    distortion of judgments as a result of a tendency to give average judgments or a tendency to give little variation in judgments.
  • contrast-effect
    distortion due to a tendency to increase existing differences between people or differences with the judge
  • willingness-effect
    distortion as a result of the tendency to avoid negative judgments or the tendency to give relatively positive judgments
  • hardness-effect
    distortion as a result of the tendency to give relative negative judgments or to give relatively few positive judgments
  • logical flaw
    distortion as a result of assuming traits on that base of psycho-’logical’ connections or assuming cause and effect

Psycho-metrical research

The measurement-preference of a test must be empirically researched with psycho-metrical research.

 

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activity abroad, study field of working area:
Countries and regions:
WorldSupporter and development goals:
This content is also used in .....

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: SanneA
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
2274