A conceptual introduction to psychometrics, development, analysis, and application of psychological and educational tests, by G. J. Mellenberg (first edition) – Book summary
- 1047 reads
Giving scores to different responses (e.g. agree = 3) is called scoring by fiat and has no theoretical justification. The observed test score is derived from the item scores by taking the unweighted or the weighted sum of the item scores. The construct score is derived from the item responses under the assumption of a latent variable response model.
The unweighted sum of item scores is the sum of all the item scores of a person j. It uses the following formula:
The weighted sum of item scores can be used because some items need to weigh heavier than others. This uses the following formula:
‘w’ denotes the weight of an item. The formula for the population mean of observed scores is the following:
It is the sum of the (un)weighted score divided by the number of test takers. ‘ε’ denotes expectation and ‘p’ denotes that the expectation is taken with respect to the population of test takers. The formula for the test variance is the following:’
It means the sum of (the test score of person j – the expected score) squared divided by the number of test takers. The expected score is the population mean of observed scores. The formula for the population standard deviation is the following:
It is the square root of the test variance. The population mean test score is estimated by using the sample mean and uses the following formula:
It is the sum of the observed test scores divided by the number of test takers. The item mean of item k uses the following formula:
It is the sum of item score on item k divided by the number of test takers. For dichotomous items, the item mean is equal to the proportion of that item. The formula for item variance of item k is the following formula:
It is the sum of the item score k minus the mean of item k squared. The item test variance uses the same formula, except it uses test scores of test taker j, instead of item scores. The item standard deviation is the square root of the item variance. Item variance for item k for dichotomous items can be calculated using the following formula:
The correlation between item k and item l can be calculated in the following way:
It is the sum of item score k minus the mean of item k times item score l minus the mean of item l divided by the square root of the sum of item score k minus the mean of item k squared times the item score l minus the mean of item score l squared.
The correlation between item k and item l for dichotomous items can be calculated in the following way:
It is the proportion of item k and item l minus the proportion of item k times the proportion of item l divided by the square root of proportion of item k times (one minus the proportion of item k) times the proportion of item l times (one minus the proportion of item l).
Covariance is the measure of how much two variables vary together. The covariance between item k and item l can be calculated using the following formula:
It is the sum of item score k for test taker j minus the mean of item k times the item score l for test taker j minus the mean of item l divided by number of test takers minus one. The covariance between item k and item l for dichotomous items can be calculated using the following formula:
It is the number of test takers divided by the number of test takers minus one times the proportion of item kl minus the proportion of item k times the proportion of item l. The test variance can also be calculated by summing up everything in the variance-covariance matrix. The variance-covariance matrix is a matrix where the covariance between all items are calculated.
A psychological or educational test is an instrument for the measurement of a person’s maximum or typical performance under standardized conditions, where performance is assumed to reflect one or more latent attributes. Tests are used for diagnosis and psychological and educational decision-making. The dimensionality of a test or subtest is equal to the number of latent attributes which effects test performance. An item is the smallest possible subtest of a test.
A mental test consists of cognitive tasks. A physical test consists of somatic or physiological measurements. A pure power test consists of problems that the test taker tries to solve, without a time-limit. A time-limited power test is a pure power test with a time-limit. A speed test measures the speed taken to solve problems. An ability test (aptitude test) measures a person’s best performance in an area that is not explicitly taught in training and educational programs. An achievement test measures a person’s best performance in an area that is explicitly taught in training and educational programs.
Test development consists of several steps:
Response scales can be dichotomous (two ordered categories), partial ordinal-polytomous (more than two ordered categories) and ordinal-polytomous (completely ordered categories). There are several item-writing guidelines:
Typical performance tests assess behaviour that is typical for the person. There are three main types of typical performance tests: personality tests (1), interest inventories (2) and attitude questionnaires (3). The steps for test development of typical performance tests are the same as the steps for test development in general.
There are three classes of strategies for the conceptual framework:
Intuitive class (no / informal knowledge) |
|
Inductive class (weak theory / knowledge) |
|
Deductive class (strong theory / knowledge) |
|
There are several item writing guidelines which are especially relevant for typical performance tests:
An indicative item is an item where a high frequency or endorsement indicates a high level of the construct. A contraindicative item is an item where a high frequency or endorsement indicates a low level of the construct. Response tendencies are the differential application of the response scales. The response set is the differential use of the item response scale by different persons and with
.....read moreGiving scores to different responses (e.g. agree = 3) is called scoring by fiat and has no theoretical justification. The observed test score is derived from the item scores by taking the unweighted or the weighted sum of the item scores. The construct score is derived from the item responses under the assumption of a latent variable response model.
The unweighted sum of item scores is the sum of all the item scores of a person j. It uses the following formula:
The weighted sum of item scores can be used because some items need to weigh heavier than others. This uses the following formula:
‘w’ denotes the weight of an item. The formula for the population mean of observed scores is the following:
It is the sum of the (un)weighted score divided by the number of test takers. ‘ε’ denotes expectation and ‘p’ denotes that the expectation is taken with respect to the population of test takers. The formula for the test variance is the following:’
It means the sum of (the test score of person j – the expected score) squared divided by the number of test takers. The expected score is the population mean of observed scores. The formula for the population standard deviation is the following:
It is the square root of the test variance. The population mean test score is estimated by using the sample mean and uses the following formula:
It is the sum of the observed test scores divided by the number of test takers. The item mean of item k uses the following formula:
It is the sum of item score on item k divided by the number of test takers. For dichotomous items, the item mean is equal to the proportion of that item. The formula for item variance of item k is the following formula:
It is the sum of the item score k minus the mean of item k squared. The item test variance uses the same formula, except it uses test scores of test taker j, instead of item scores. The item standard deviation is the square root of the item variance. Item variance for item k for dichotomous items can be calculated using the following formula:
The correlation between item k and item l can be calculated in the following way:
It is the sum of item score k minus the mean of item k times item score l minus the mean of item l divided by
.....read moreMeasurement precision consists of information (1) and reliability (2). Information applies to the test score of a single person. It is the within-person aspect of measurement precision. Reliability applies to a population of persons. It is the between-person aspect of measurement precision. The true score is the score with a perfect measurement instrument. Measurement error is the distortion of the true score because the measurement instrument is not perfect. They are unsystematic influences.
The test score of a person is the true score of that person plus the measurement error:
The true score equals the mean test score over infinite test administrations. The expected value of the measurement error over infinite trials equals zero. Test taker j’s standard error of measurement is the square root of the within-person variance. The information on test taker j’s true score is the inverse of the within-person error variance and uses the following formula:
0.8-1.0 | Good |
0.7-0.8 | Sufficient |
0.6-0.7 | Moderate |
0.0-0.6 | Poor |
ere are several guidelines for reliability. A parallel test is a different test with exactly the same properties as the original test. Reliability is the correlation between two parallel test scores. Reliability can be calculated by splitting the test in two halves and treating them both as parallel tests. The reliability of each part equals the correlation between the two parts. It makes use of the following formula:
It is two times the correlation between part one and part two divided by one plus the correlation between part one and part two. Cronbach’s alpha is used to calculate the lower-bound of the reliability of the full test. It uses the following formula:
It is the number of items divided by number of items minus one times (one minus the sum of item variances divided by the test variance). The reliability depends on the number of items. A large test is usually more reliable than a shorter test. The correlation between the test scores is not equal to the correlation between the constructs, as the
.....read moreAn item score distribution can be described by location (1), dispersion (2) and shape (3). Item difficulty is a parameter in maximum performance tests. More test takers fail on more difficult items. Item attractiveness is a parameter in typical performance tests. More test takers choose attractive items. The item difficulty / item attractiveness is equal to the item mean.
Items with small variances do not contribute much to the overall variance. There is a danger of small variances due to floor / ceiling effects in Likert scales (e.g. items with low attractiveness). Large item correlations result in high reliability. Item discrimination refers to how well a given item can distinguish between people that differ on the underlying construct.
The item-test correlation is the correlation between the scores on a given item and the test scores. Items that discriminate well have a high item-test correlation. It uses the following formula:
It is the sum of item score k for test taker j minus the mean for item k times the test score for test taker j minus the mean of the test score divided by the square root of the sum of the item score k for test taker j minus the item mean for item k squared times the sum of the test score for test taker j minus the mean of the test score squared. In other words, it is the covariance between item k and the test score divided by the standard deviation of item k times the standard deviation of the test score.
The item-rest correlation is the correlation between the scores on a given item and the rest score, the score without that item. It is used because in the item-test correlation, correlation is biased upwards as you are correlating an item with itself. It uses the following formula:
It is the sum of the item score k for test taker j minus the mean for item k times the test score for test taker j without item k minus the mean of the test score without item k divided by the square root of the sum of the item score k minus for test taker j minus the mean for item k squared times the sum of the test score for test taker j without item k minus the mean of the test score without item k squared. In other words, it is the covariance between item k and the test score without that item (rest score) divided by the standard deviation of item k times the standard deviation of the test score without item k (rest score standard deviation).
The item-reliability index uses the following formula:
It is the correlation between item k and the test score times the standard deviation of item k. It uses the item-test correlation and not the item-rest
.....read moreJoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Field of study
Add new contribution