WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva
- 1600 keer gelezen
Validity: if test-results can be interpreted in terms of the construct the test tries to measure.
The nomological network: the system of hypothetical relations around the construct.
This can be a part of the theory.
Forms of validity:
Impression-validity: an subjective judgment of the usability of an measurement-instrument on the base of the direct observable properties of the test-material.
Content-validity: the judgment about the representativeness of the observations, appointments, and questions for a certain purpose.
Criterium-validity: the (cor)relation between test-score and a psychological or social criterium.
Process-validity: the manner on which the response is established.
Construct-validity: A part of the similarities between the strictly formulated, hypothetical relations between the measured construct, and other constructs and otherwise empirical proved relations between instruments which should measure those constructs.
Internal consistence-reliability: mutual cohesion of items that form a scale or sub-tests.
Repeated reliability: repeated measures with the same instrument
Local reliability: an impression of the reliability of the measurement within a certain wide of scores.
The homogeneity or consistency-reliability: the cohesion between the different (items) of a scale. With psychological measurement, it is assumed the the items are repeated, independent measures of a trait.
The reliability of the prediction: the repeatability of the prediction on a certain point of time.
Stability of the prediction: the repeatability of the prediction in the course of time.
Base rate: the proportion of people in the population that possesses a particular trait, behaviour, characteristic, or attribute.c
Criterium-group: a, for the users-goal of the test, representative group of which all the members have the same criterium-behaviour and of which all the criterium-scores are known.
Hits
Hit: a correct classification
Hit rate: the proportion of people that an assessment tool accurately identifies as possessing or exhibiting a particular trait, ability, behaviour, or attribute
Misses
Miss: an incorrect classification
Miss rate: the proportion of people that an assessment tool inaccurately identifies as possessing or exhibiting a particular trait, ability, behaviour, or attributeReturn to investment: the ratio of benefits to costs.
Prediction-error or classification-error: the percentage wrongfully submitted cases by the test.
False positive: a specific type of miss whereby an assessment tool falsely indicates that the test-taker possesses or exhibits a particular trait, ability, behaviour, or attribute
False negative: a specific type of miss whereby an assessment tool falsely indicates that the test-taker does not possess or exhibit a particular trait, ability, behaviour, or attribute
Sensitivity and specificity
Sensitivity or predictive accuracy: the percentage rightfully submitted cases that actually has the trait (hits).
Specificity: the percentage of cases that is rightfully not submitted and that also doesn’t have the trait.
Predictive values
Positive predictive value (PPV): the percentage that is rightfully detected with the trait by the test of the total persons that the test said has the trait.
Negative predictive value (NPV): the percentage which the test rightfully said didn’t have the trait of the total of people the test said didn’t have the trait.
Measurement-model: about what the constructor wants to measure
Structure-model: about what the constructor wants to predict
Prediction: first the test-score is established, then the criterium-score
Postdiction: first is the criterium-score established, then the test-score
The reflective interpretation: the measured attribute is conceptualized as the common cause of the observables
Formative interpretation: the measured attribute is seen as the common effect of the observables.
Utility of an instrument: the use of an instrument as becomes apparent from a costs-bate analysis.
Utility analysis: a family of techniques that entail a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment.
Utility gain: an estimate of the benefit of using a particular test or selection method.
Norm-group: the group of people that forms the norm
Taylor-Russell tables: increase the base rate of successful performance that is associated with a particular level of criterion-related validity.
Naylor-Shine tables: tells us the likely average increase in criterion performance as a result of using a particular test or intervention: also provides selection ratio needed to achieve a particular increase in criterion performance.
The expectancy table or chart: tells us the likelihood that individuals who score within a given range on the predictor will perform successfully on the criterion.
A domain: a wide area of more or less coherent properties.
Difficulty: the attribute of not being easily accomplished, solved, or comprehended.
Discrimination: the degree to which an item differentiates among people with higher levels or lower levels of the trait, ability, or whatever it is being measures.
Local dependence: items are all dependent on some factor that is different from what the test as a whole is measuring. Items are locally dependent if they are more related to each other than to the other items on the test.
Dichotomous or polytomous
Dichotomous test items: test items or questions that can be answered with only one or two alternative responses.
Polytomous test items: test items or questions with three or more alternative responses, where only one is scored as being consistent with a targated trait or other construct.
Inter and intra individual differences
Inter-individual differences: differences between people
Intra-individual differences: differences within people
Cross-validation: control of instability of outcomes
Method-variance: systematic variance as a result of the measurement-procedure with which the trait is measured.
Rasch model: an IRT model with very specific assumptions about the underlying distribution.
Information in IRT: the precision of measurement.
Item characteristic curve (ICC), an item response curve, a category response curve, or an item trace line: the expression in graphic form of the probabilistic relationship between a test-taker’s response to a test item and that test-taker’s level on the latent construct being measured.
The unidimensionality assumption: the set of items measures a single continuous latent construct. This construct is referred to by the Greek letter theta (θ).
The assumption of local independence: a) there is a systematic relationship between all of the test items and b) that relationship has to do with the theta level of the test-taker. When the assumption is met, it means that differences in responses to items are reflective of differences in the underlying trait or ability.
The assumption of monotonicity: the probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases.
This magazine contains all the summaries you need for the course WSRt at the second year of psychology at the Uva.
Critical thinking
Article: Stouthard, M, E, A
Validity
Validity: if test-results can be interpreted in terms of the construct the test tries to measure.
A test is taken to make an inference about an construct that lies outside the measure-instrument itself, and that the instrument is supposed to measure.
Understanding of these results lie in lay in the extent to which are an indication of the construct.
Validity is an overacting concept. It is a term for an number of possible properties of a test.
Often multiple empirical sorts of knowledge are needed to get validity of a test.
Which sources of empirical knowledge are important for a test, depends on the users-goal of a test.
Two sorts of validity:
The difference between these two isn’t absolute.
When a test is meant to predict behaviour outside the test-situation, it is relevant to ask whether the instrument is a good predictor of the behaviour.
How better the test predicts the variations of the criterium, the higher the validity of the test.
The criterium
Like a test, a criterium is an operationalization of an underlying concept.
More criteria are possible.
There are different methods to make distinctions between criteria.
Kinds of criteria
An distinction between:
An distinction in time
Distinction of criteria in the future
Relation test and criterium
The relation between a test and a criterium is mostly expressed as a correlation between both.
There is a cohesion but no causality.
A condition to interpret a relation between a test and a criterium as support for the criterium-oriented validity, is that there is at least one
.....read moreCritical thinking
Article: Oostervel & Vorst (2010)
Psychological measurement-instruments
The construction of measurement-instrument is an important subject.
Measurement preferences of an instrument: the goal of an measurement-instrument.
This is about a more or less hypothetical property.
The domain of human acting
The instrument is usually focussed on measuring an property in a global domain of human acting.
A domain: a wide area of more or less coherent properties.
Observation methods
Every measurement-instrument uses one or more observation methods. For different properties of different domains, usually different observation methods are used.
When properties are measured with different observation methods, it is logical that with different methods, different domains of the traits or categories are measured.
Instruments based on one observation method seem to form a common method-factor, which usually is stronger than the common trait-factor of equal traits measured with different observation methods.
Theory
The development of an instrument is usually based on an elaborated theory or insights based on empirical research or ideas based on informal knowledge.
Instruments developed on the base of formal knowledge and an elaborated theory are of better quality than instruments based on informal knowledge and an poorly formulated theory.
Construct
An instrument forms the elaboration of an construct that refers to an combination of properties.
Measurement instruments for specific (latent) traits are of better quality than instruments for global traits or composite traits.
Structure
The structure of an test depends on the properties it measures.
Unstructured observation-methods are the measurement-conditions that aren’t standardized and because of that it’s measurement-results are difficult to compare to other persons and situations. Objective scores are difficult to obtain.
Application possibilities
The application possibilities of an measurement-instrument the researcher wants to achieve can be related to theoretical or describing research.
It is about analysis of an great number of observations.
For individual applications high requirements are placed on realised measurement-preferences.
Costs
An often decisive element in the description of the measurement-preferences of an measurement-instrument are the costs of that instrument.
Dimensionality
An instrument consists of one or more measurement-scales or sub-tests.
More scales refer to more dimensions of the construct and a subdivision in more latent traits or latent categories.
An instrument that is based on a specific latent trait must be one-dimensional.
Reliability
Three kinds of reliability:
Critical thinking
Article Kan, K., and van der Maas, H. (2010)
Intelligentie versus cognitie: tijd voor een (goede) relatie
Inter-individual differences: differences between people
Intra-individual differences: differences within people
There are many difference views regarding intelligence. This makes it difficult to pin down what people in psychology call intelligence.
Alternative theories
In some cases, mutual interactions between populations lead to a situation in which parties take profit of each other.
The growth of one population leads the other population to grow. This is mutual.
This dynamical interaction is mutualism.
As a result of individual differences in limited capacities and as a result of mutualistic interactions between cognitive processes, cognitive processes become correlated in the course of development.
The functionally independent cognitive functions within each individual become positively correlated.
The functionally independent cognitive functions within each individual become statistical dependent over groups of people.
Implications
It is possible that the positive cohesion between cognitive abilities is caused by mutualistic interactions that are a result of cognitive development and measurement-problems.
It can’t be ruled out that some influences have effect on the development of all (or multiple) cognitive abilities.
Intelligence can best be compared with an index of general health; it isn’t a real property like cognitive processes are.
Critical thinking
Article: Schmittmann, V, D., Cramer, A, O, J., W., Waldorp, L, J., Epskamp, S., Kievit, R, A., & Dorsboom, D (2011)
Deconstructing the construct: A network perspective on psychological phenomena
In psychological measurement, three interpretations of measurement systems have been developed:
In reflective models, observed indicators (item or subject scores) are modelled as a function of a common latent (unobserved) and item-specific error variance.
Commonly presented as ‘measurement models’.
In these models, a latent variable is introduced to account for the covariance between indicators.
In formative models, possibly latent composite variables are modelled as a function of indicators.
Without residual variance on the composite, models like principal components analysis and clustering techniques serve to construct an optimal composite out of observed indicators.
But, one can turn the composite into a latent composite if one introduces residual variance on it.
This happens, for instance, if model parameters are chosen in a way that optimizes a criterion variable.
Formative models differ from reflective models in many aspects
The role of time
In most conceptions of causality, causes are required to precede their effects in time.
But, in psychometric models like the reflective and formative models, time is generally not explicitly represented.
The dynamics of the system are not explicated.
This puts the causal interpretation of latent
.....read moreCritical thinking
Article: Cohen
Item response theory (IRT)
The procedures of item response theory provide a way to model the probability that a person with X ability will be able to perform at a level of Y.
Because so often the psychological or educational construct being measured is physically unobservable (latent), and because the construct being measured may be a trait, a synonym for IRT is latent-trait theory.
IRT is not a term used to refer to a single theory or method.
It refer to a family of theories and methods, and quite a large family at that, with many other names used to distinguish specific approaches.
Difficulty: the attribute of not being easily accomplished, solved, or comprehended.
Discrimination: the degree to which an item differentiates among people with higher levels or lower levels of the trait, ability, or whatever it is being measures.
A number of different IRT models exists to handle data resulting from the administration of tests with various characteristics and in various formats.
Other IRT models exits to handle other types of data.
In general, latent-trait models differ in some important ways from CTT.
Such assumptions are inherent in latent-trait models.
Rasch model: an IRT model with very specific assumptions about the underlying distribution.
Three assumptions regarding data to be analysed within an IRT framework.
Unidimensionality
The unidimensionality assumption: the set of items measures a single continuous latent construct.
This construct is referred to by the Greek letter theta (θ).
It is a person’s theta level that gives rise to a response to the items in the scale.
Theta level: a reference to the degree of the underlying ability or trait that the test-taker is presumed to bring to the test.
The assumption of unidimensionality does not preclude that the set of items may have a number of minor dimensions (which, in turn, may be measured by subscales).
It does assume that one dominant dimension explains the underlying structure.
Local independence
Local dependence: items are all dependent on some factor that is different from what the test as a whole is measuring. Items are locally dependent if they are more
.....read moreCritical thinking
Article: Oosterveld & Vorst 2010
Testconstructie en testonderzoek
There are problematic theories about validity
Examples van viewpoints
Dorsboom (2003)
According to Dorsboom, it is plausible that the mercury thermometer is a valid measurement of temperature of objects, because differences in the real temperature cause differences in the measurement-instrument.
If the causal string is described exactly, and this is a plausible representation of reality, than is the instrument valid in reality.
Real validity is unknown as long as not all the relevant knowledge is available.
Because it is in principle unknown in what extent relevant knowledge is available, validity is hypothetical unsure.
Even if the causal string between true variation in the trait and the measured variation is known well, knowledge about the causal strings can change due to new knowledge. This is why real validity is hypothetical.
However, people can have a judgment about the validity of measurement-instruments. This validity-judgment doesn’t have anything to do with the real validity.
In psychology, true causal strings are (yet) impossible
That is why psychology temporarily deals with hypothetical validity-judgments. This is in suspense of more precise and true causal strings between true trait-variation and measurement-variation.
The quality of measurement, not the validity, must be proven from psycho-metrical analysis (reliability, one-dimensionality, representative content of the measurement-instrument, connections with external criteria, support of theoretical expected connections)
Science-philosophical viewpoint
Description of validity
Derived statements
Research to measurement-quality/validity
Critical thinking
Article: Cohen
Utility Analysis
Utility analysis: a family of techniques that entail a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment.
It is not one specific technique used for one specific objective. It is an umbrella term covering various possible methods, each requiring various kinds of data to be inputted and yielding various kinds of output.
In a most general sense, a utility analysis may be undertaken for the purpose of evaluating whether the benefits of using a test outweigh the costs.
If undertaken to evaluate a test, the utility analysis will help make decisions whether:
If undertaken for the purpose of evaluating a training program or intervention, the utility analysis will help make decisions regarding whether:
The endpoint of a utility analysis is typically an educated decision about which of many possible courses of action is optimal.
The specific objective of a utility analysis will dictate what sort of information will be required as well as the specific methods to be used.
Expectancy data
Some utility analyses will require little more than converting a scatterplot of test data to an expectancy table.
An expectancy table can provide an indication of the likelihood that a test-taker will score within some interval of scores on a criterion measure.
Taylor-Russell tables: increase the base rate of successful performance that is associated with a particular level of criterion-related validity.
The value assigned for the test’s validity: the computed validity coefficient.
But, the relationship must be linear.
Naylor-Shine tables: tells us the likely average increase in criterion
Critical thinking
Article: Oosterveld & Vorst, 2010
Voorspellen van een criteriumwaarde
The test-scores and criterium-values can lay on a (almost) continuous scale, or have a dichotomous character.
Usually, criterium-values are established by judgements of experts.
Commonly, a criterium-value is valuable, it is true for the time being.
The test-score and criterium-value can be established simultaneously or with a short or long period in between.
This has effect on the interpretation of the table
With a long time in between, prediction becomes less stable.
Usually, criterium-values are placed in the vertical axis and test-scores on the horizontal axis.
Not everyone uses this system
Base rate or prevalence: the percentage occurrence of the trait in the population.
With a low prevalence, finding the trait is difficult.
The use of a test must lead to a higher percentage well detected cases (hits) than the prevalence. Otherwise, using the test is useless.
Sensitivity and specificity are direct clues to the predictive value of the test.
PPV and NPV are direct clues to the predictive value of the test.
Reliability of the prediction
The reliability of the prediction: the repeatability of the prediction on a certain point of time.
The reliability of the prediction can be established with cross-validation.
Stability of the prediction
Stability of the prediction: the repeatability of the prediction in the course of time.
Especially important when the predictions are about a period of time.
The stability of
.....read moreCritical thinking
Article: Dawes, R, M., Faust, D., & Meehl, P, E. (1989)
Clinical versus actuarial judgment
In the clinical method, the decision maker combines or processes information in his or her head.
In the actuarial or statistical method, conclusions rest solely on empirically established relations between data and the condition or event of interest.
The actuarial method should not be equated with automated decision rules alone.
To be truly actuarial, interpretations must be both automatic (pre-specified or routinised) and based on empirically established relations.
Virtually any type of data is amenable to actuarial interpretation.
The combination of clinical and actuarial methods offers a third potential judgment strategy, one for which certain viable approaches have been proposed.
But, most proposals for clinical-actuarial combination presume that the two judgment methods work together harmoniously and overlook the many situations that require dichotomous choices.
Conditions for a fair comparison of the two methods:
Actuarial methods seem to have advantages over the clinical method.
Although most comparative research in medicine favours the actuarial method overall, the studies that suggest a slight clinical advantage seem to involve circumstances in which judgments rest on firm theoretical grounds.
Consideration of utilities. Depending on the task, certain judgment errors may be more serious than others.
The adjustment of decision rules or cutting scores to reduce either false-negative or false-positive errors can decrease the procedure’s overall accuracy by may still be justified if the consequences of these opposing forms of error are unequal.
The clinician’s potential capacity to capitalize on configural patterns or relations among predictive cues raises two related but separable issues:
A unique capacity to observe is not the same as a unique capacity to predict on the bases of integration of observations.
Greater accuracy may be achieved if the skilled observer performs this function and then steps
Validity: if test-results can be interpreted in terms of the construct the test tries to measure.
The nomological network: the system of hypothetical relations around the construct.
This can be a part of the theory.
Forms of validity:
Impression-validity: an subjective judgment of the usability of an measurement-instrument on the base of the direct observable properties of the test-material.
Content-validity: the judgment about the representativeness of the observations, appointments, and questions for a certain purpose.
Criterium-validity: the (cor)relation between test-score and a psychological or social criterium.
Process-validity: the manner on which the response is established.
Construct-validity: A part of the similarities between the strictly formulated, hypothetical relations between the measured construct, and other constructs and otherwise empirical proved relations between instruments which should measure those constructs.
Internal consistence-reliability: mutual cohesion of items that form a scale or sub-tests.
Repeated reliability: repeated measures with the same instrument
Local reliability: an impression of the reliability of the measurement within a certain wide of scores.
The homogeneity or consistency-reliability: the cohesion between the different (items) of a scale. With psychological measurement, it is assumed the the items are repeated, independent measures of a trait.
The reliability of the prediction: the repeatability of the prediction on a certain point of time.
Stability of the prediction: the repeatability of the prediction in the course of time.
Base rate: the proportion of people in the population that possesses a particular trait, behaviour, characteristic, or attribute.c
Criterium-group: a, for the users-goal of the test, representative group of which all the members have the same criterium-behaviour and of which all the criterium-scores are known.
Hits
Hit: a correct classification
Hit rate: the proportion of people that an assessment tool accurately identifies as possessing or exhibiting a particular trait, ability, behaviour, or attribute
Misses
Miss: an incorrect classification
Miss rate: the proportion of people that
.....read moreThis magazine contains all the summaries you need for the course WSRt at the second year of psychology at the Uva.
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Field of study
Add new contribution