In the previous chapter we discussed the conceptual framework of validity, where we could identify five types of proof of validity. One of the types of evidence was convergent and divergent validity: the extent to which test scores have the "right" pattern of associations with other variables. This is discussed further in this chapter.
Psychological constructs are laid down in a theoretical context. The basis of the construct has connections with the basis of other psychological constructs. The connection between the construct and other related constructs is called a nomological network . According to this network, measurements from one construct would be strongly associated with some other constructs, but weakly correlate with measurements from other constructs. For validity it is important that the test scores correspond as much as possible with the expected associations.
There are four methods used to look at convergent and discriminatory associations. The following four methods are common methods for evaluating convergent validity and discriminant validity:
- Focus on certain associations;
- correlation sets;
- multitrait multimethod matrices;
- quantify construct validity (QCV).
These four methods are discussed in this section.
1. Focused associations
With some measurements it is fairly clear which specific variables are related to it. For the validity of the interpretations, the relationship between the test scores and those specific variables can then be examined. When the test scores are highly correlated with the variables, there is a strong validity and if the correlations are low, the validity can be called into question. Test developers get more confidence in the test if the correlation with relevant variables is high. These correlations are called validity coefficients. The quality of a test is higher when the validity coefficients are high.
A process in which all validity coefficients are tested in multiple studies is called validity generalization. The most validity proof comes from relatively small studies. The correlation is then calculated between the test scores that have been measured and the scores on the criterion variables. The small studies are often done and can also be used, but it also has a disadvantage. If a test has been carried out at one location or at a certain population, which results in an excellent validity score, this does not necessarily mean that it will be the same at another location or at another population.
Studies that look at the validity generalization are meant to investigate the usefulness of the test scores. This type of research is a kind of meta-analysis, they combine the results of several smaller studies into one large analysis. There are three important things about validity generalization:
- It can reveal a general level of the predicted validity of all smaller studies.
- It can reveal the degree of variability between the smaller studies.
- It deals with the source of the variability between the smaller studies. Further analysis of small studies may explain differences between these studies.
2. Sets of correlations
The nomological network of a construct can have associations with other constructs of different levels. As a result, when evaluating convergent validity and discriminant validity, a large amount of criterion variables can be considered.
The researchers usually calculate all correlations between the variable and the criterion variables. From there, a subjective look is taken at which correlations and therefore which criterion variables are relevant. So which criterion variables go into the nomological network. This approach to evaluating validity is common among researchers. First the researchers collect as much data as possible and take many relevant measurements. Then the correlation patterns are examined and the patterns that mean something for the test are included in the test.
3. Multitrait multimethod matrices
Campbell and Fiske have developed the multitrait multimethod matrix (MTMMM) from the conceptual basis of Cronbach and Meehl. With the MTMMM analysis, construct validity is obtained by means of measurements of multiple properties and several different methods are used. The purpose of the MTMMM analysis is to get a clear evaluation of the convergent validity and the discriminant validity. Two important sources of variance can influence the correlations between the measurements. These are the property variance and the variance in the methods. A high correlation between two properties can mean that they share property variance. A correlation can also be high because both properties were measured with the same method. They then have a shared method variance. This can cause a correlation to come out while the properties have no correlation at all. But because it was done with the same test, there is a correlation by the way the subject thinks. For example, the subject can make both tests with low self-esteem, it makes sense that there can be a correlation with two traits. A high correlation can therefore indicate sharing of the property variance but it can also indicate a shared method variance. A correlation can also be weak because two different methods have actually been used, while the properties may actually have a correlation. This makes it difficult to interpret construct validity. Each correlation is a mix of property variance and method variance. The MTMMM analysis organizes relevant information and makes it easier for researchers to interpret the correlations.
An MTMMM analysis must be a good test of different correlations that represent different property and method variances. This is possible, for example, with two correlations:
- A correlation in which the same property was tested with two different measurements.
- A correlation in which different properties were tested with one type of measurement.
The first correlation is expected to be strong and the second correlation to be weaker. When the method variance is included, it can be expected that the first correlation is weaker and the second correlation stronger.
Campbell and Fiske (1959) have derived four types of correlations from the MTMMM:
- Heterotrait-heteromethod correlations: these are different traits, measured with different methods.
- Heterotrait-monomethod correlations: here different properties are subjected to the same method.
- Monotrait-heteromethod correlations: the same property (construct) is measured with different methods.
- Monotraot-monomethod correlations: a property is measured with a method. These correlations represent the reliability; the correlation of the measurement with itself.
Evaluating the construct validity, the property variance and the method variance with the different correlations can be looked at in a clear way with the MTMMM analysis. The convergent validity can be found by looking at the monotrait-heteromethod correlations. The correlations of the measurements that share the property variance and do not share a method variance must be greater than the correlations of the measurements that share no property variance and no method variance. Also, the correlations of the measurements that share the property variance and do not share method variance must be greater than the correlations of the measurements that do not share the property variance and the method variance.
Today, improvements for the MTMMM analysis are still being looked into. Despite the familiarity of the MTMMM analysis, it is not often used.
4. Quantifying Construct validity (QCV)
With this method, researchers quantify the extent to which the theoretical predictions for the convergent and discriminant correlation fit with the actual obtained correlations. So far, evidence for convergent validity and discriminant validity has been primarily subjective. The one may find the correlation strong while the other experiences it as less strong. The QCV procedure has been developed to obtain as objective and precise validity as possible. This makes the fourth method different from the other three methods.
The effect measures are taken first from the QCV analysis. So here we look at the extent to which the actual correlations correspond to the predicted correlations. These effect measures are r alerting -CV and r contrast -CV. called. High and positive correlations mean that the actual convergent and discriminant correlations are very similar to the predicted convergent and discriminant correlations. Secondly, a test of statistical significance follows from the QCV analysis. The statistical significance examines whether the correspondence between the two correlations has not happened by chance.
The QCV analysis takes place in three phases:
- Researchers make clear predictions about the expected convergent and discriminant validity correlations. Good consideration must be given to the criteria attached to the measurements and a prediction must be made of each correlation relevant to the test.
- In the second phase the researchers collect the data and the actual convergent and discriminant correlations are calculated. These correlations show the actual correlations between the variable that we are interested in and the criterion variables.
- In the third phase, the extent to which the predicted correlations and actual correlations are matched is quantified. If the correlations are well matched, this means a high validity. If the correlations do not match well, this means a low validity. The test is presented with two types of results, the effect measures and the statistical significance. For the effect measure r alerting -CV , the correlation between the predicted correlations and the actual correlations is calculated. A high, positive correlation means that the predicted correlations and the actual correlations are well matched. The same applies for the r constrast -CV , the greater the correlation, the better it is for the convergent and discriminant validity. The r constrast -CV is similar to the r alerting -CV , but it is corrected for the mutual correlations between the criterion variables and the absolute level of the correlations between the main test and the criterion variables. The statistical significance is also examined in the third phase of the QCV analysis. This examines the size of the test and the amount of convergent and discriminant validity. With a z- test it is then calculated whether the correlations were not obtained by chance.
The QCV approach can be a good approach, but it is not perfect. The effect measures may have low values due to incorrect predictions, while the proof of validity may be high. A wrong choice can also be made in choosing the criterion variables. Another criticism is that researchers had high values for the effect measures but that predicted convergent and discriminant correlations did not match well with the actual convergent and discriminant correlations.
Multiple strategies can be used in the analysis of tests. Although the QCV analysis is not perfect, it still has advantages over the other methods. First, the QCV analysis allows researchers to look closely at the pattern of convergent and discriminant validity that would theoretically make sense. Second, it allows the researchers to make explicit predictions about the associations with other variables. Thirdly, the QCV analysis focuses on the variable of interest. Finally, it provides an interpretable value that reflects the extent to which the actual outcomes match the predicted outcomes, and the QCV analysis also contains statistical significance.
In the previous section we discussed strategies that can be used to accumulate and interpret evidence for convergent and / or divergent validity. All these strategies are determined to a greater or lesser extent by the size of the validity coefficients. Validity coefficients are static results that represent the degree of associate between a test and one or more criterion variables. It is important to be aware of the factors that can influence the validity coefficients. For that reason we discuss those factors in this section.
The associations between constructs
A factor that influences correlation is the true association between two constructs. If two constructs are strongly associated with each other then a high correlation is likely to result. With predictions, a correlation is expected, because it is thought that there is a connection between the constructs.
The measurement error and reliability
Measurement errors can influence the correlations and therefore also the validity coefficients. The correlation between testing of two constructs is:
rxoyo = rxtyt √( Rxx * Ryy)
rxoyo here is the correlation between the two tests, rxtyt here is the actual correlation between the two constructs, Rxx is the reliability of the test variable and Ryy is the reliability of the criterion variable. To evaluate the convergent validity, researchers must compare the correlations with the expected correlations. When evaluating the validity correlation, one must take into account the fact that two reliabilities are used. The reliability of the test and the reliability of the criterion test. The criterion test can have a low reliability, which means that the validity is also lower. If the reliability of one of the tests is low, this can be tackled in two ways. The first is to give less weight to the test with low reliability when assessing validity. The second is to adjust the validity coefficient by means of the correction for attenuation. If you want to adjust the coefficient for the reliability of one test, then this form of the formula can be used:
rXY-adjusted = rXY-original / √Ryy
rXY-original is the original validity correlation, Ryy is the estimated reliability of the criterion variable and rXY-adjusted is the adjusted validity correlation.
A limited range
A correlation coefficient shows the covariability between two distributions of scores (we discussed this earlier in Chapter 3). The amount of variability in the distributions can influence the correlations between the two sets of scores. The correlation can therefore be limited by a limited range in both distributions and as a result it provides relatively poorer proof of validity.
There are no clear, simple guidelines to identify the degree of range limitation; it especially requires caution (careful thought) and attention from the researchers. Importantly, such as knowledge about relevant tests and variables. For example, a researcher must thoroughly analyze whether all observed scores fall within the range that is (theoretically) possible for that construct. A commonly used method to assess the degree of range limitation is by looking at the degree of convergent and discriminant validity. The correlations are used to look at the quality of the psychological measurement. With strong correlations expected, the convergent evidence is examined. The correlations can be lower due to the influence of a limited range.
The relative proportions
The skew of the distributions of the scores also affects the size of the validity coefficient. If the two variables that correlate with each other have a different skew then the correlation between these variables will be reduced. So if research is being done into a variable with a very skewed distribution, then it is possible that a relatively small validity coefficient will come out.
The formula for the correlation between a continuous and a dichotomous variable ( r CD ) is:
rCD = cCD / sCsD
cCD is the covariance between the two variables, sC is the standard deviation of the continuous variable and sD is the standard deviation of the dichotomous variable. Due to the proportion of observations in the two groups with the dichotomous variable, the covariance and the standard deviation are directly influenced. The covariance for this is:
cCD = p1p2 (C2avg - C1avg )
p1 is the proportion of participants in group 1, p2 is the proportion of participants in group 2, C1avg is the average of the continuous variable in group 1 and C2avg is the average of the continuous variable in group 2. The standard deviation of the dichotomous variable is the second term that is influenced by the proportion of observations. The formula for this is:
sD = √p1p2
The calculation for the correlation can be converted to show the direct influence of the relative proportions:
rCD = √p1p2 (C2avg - C1avg ) / sC
This formula shows the influence of group promotions on the validity correlations. When the validity coefficient is based on a continuous variable and a dichotomous variable, the validity can be influenced by differences and the size of the groups. The validity can be lower if there is a difference in the size of the groups.
The method variance
This was previously discussed in the MTMMM analysis. Correlations between two different methods are smaller than correlations between measurements from one method. If only one method is used , there is a good chance that the correlation is greater, because it also contains a shared method variance.
Time
Validity coefficients based on correlations calculated from measurements at different times are smaller than correlations calculated from measurements at the same times. And longer periods between two moments in time will produce smaller predictive validity correlations.
The predictions of some events
An important factor that can influence the validity coefficient is whether the criterion variable has been one event or a list of multiple events. One-off events are more difficult to predict than a list of multiple events. It is more likely to obtain large validity coefficients when the criterion variable is based on the listing of multiple events.
Once the validity coefficient has been determined, it must be decided whether it is high enough for convergent evidence, or low enough for certainty of discriminatory validity. Although there is a precise way to quantify the relationship between two measurements, it will not always happen intuitively. Especially for inexperienced researchers, evaluating validity can then be problematic. He or she does not know well when a correlation is strong or weak.
The explained variance and squared correlations
In psychological research it is common to use squared correlations. These show the proportions of variance in one variable, which are explained by the other variable. The explained variance interpretation is attractive, because earlier claims say that research in general is about measuring and understanding variability. The more variability you can explain, the better it can be understood. For the explained variance, the variance analysis is applied (ANOVA).
There are three reasons for criticizing the squared variance:
- In some cases it is technically wrong.
- Some experts say that the variance itself is a non-intuitive metric. For a measurement of differences in a set of scores, the variance is based on the squared deviations from the mean.
- Squaring the correlation can make the relationship between two variables appear smaller.
The squared correlation approach for interpreting the validity coefficients is widely used, but it can also be misleading. It also has a number of technical and logical problems.
Estimating practical effects
One way to interpret the correlation is to estimate how much effect it has in real life. The greater the correlation between the test and the criterion variable, the more successful it can be used in decisions about the criterion variable.
Four procedures have been developed to properly predict the correlations:
1. Binominal Effect Size Display (BESD)
This procedure has been developed to show the practical consequences of using correlations to make decisions. With the BESD you can see how many predictions are successful and how many predictions will not be made successful based on the correlation. That can be viewed with a 2x2 model. The following formula is used to predict the number of people in a cell of the table:
Cell A = 50 + 100 (r / 2)
r is here the correlation between the test and the criterion. The following formula applies to cell B:
Cell B = 50 - 100 (r / 2)
Cell C has the same formula as Cell B and Cell D has the same formula as Cell A.
By putting the validity correlation in a kind of table and converting the numbers into successful predictions, it is easier to see whether the test has a good validity. The criticism is that the test is only suitable if as many people score high as low. And it is made for a situation in which half of the sample is 'successful' on the criterion and the other half unsuccessful. The BESD assumes the same relative proportions.
2. Taylor-Russel Tables
These tables can be used when the assumption of equal proportions is unfounded.
These tables give the chance that a prediction based on an 'acceptable' test score will lead to a successful implementation of the criterion. The Taylor-Russel tables, like BSED for the test and the outcomes, have dichotomous variables. The difference with the BSED is that the Taylor-Russel tables can make decisions based on different proportions. For the Taylor-Russel tables, we need to know the size of the validity coefficient, what the selection proportion is and what the successful selection proportion is if the selection would have been made without the test.
3. Utility Analysis
The utility analysis formulates validity in terms of costs versus benefits. Researchers must assign valid values to various aspects of testing and decision making in the process. First, the benefit of using this test to make decisions compared to other methods that can be used must be estimated. The researcher must then estimate the costs (disadvantages) if this test is used to make a decision.
4. Sensitivity and specificity
This is especially useful for tests designed to identify a categorical difference. The ability of the test to make the correct identifications with regard to the categorical difference can then be evaluated. An example is a diagnosis where the disorder may be present or absent. There are four possible outcomes:
- True positive, the test provides a good identification where the disorder is really present (true positive).
- True negative, the test gives a good identification where the disorder is not present (true negative).
- False positive, the test indicates that the disorder is present while in reality it is not(false positive).
- False negative, the test indicates that the disorder is absent while it is actually present(false negative).
Values of sensitivity and specificity are values that summarize the proportions of good identifications. The sensitivity let the chance to see someone with a disorder correctly identified by the test. Specificity shows the probability see someone who the disorder is not correctly identified by the test. In reality one can never know if someone has a disorder, but it is a guideline that is trusted. We will illustrate both concepts with an example (see the table below).
Table 1.
| In reality, the disorder ... |
According to the test, the disorder is ... | Present | Absent |
Present | 80 | 120 |
Absent | 20 | 780 |
In this example there are 80 true positives, 120 untrue positives, 20 untrue negatives and 780 true negatives. The sensitivity can be calculated by:
Sensitivity = true positives / (true positives + false negatives)
In the example this amounts to: 80 / (80 + 20) = 80/100 = .80. The sensitivity (the proportion of individuals with the disorder who were correctly identified by the test) is therefore .80. In other words, 80% of people who actually have the disorder are also identified as such by the test. Although there is a high (ie good) sensitivity here, sensitivity alone is not sufficient to claim that there is a high validity. There must also be a high degree of specificity.
Specificity = true negatives / (true negatives + false positives)
In the example this amounts to: 780 / (780 + 120) = 780/900 = .87.
The proportion of people without a disorder who are also identified as such by the test (ie as the absence of the disorder) is .87. In this example, there is therefore a high degree of sensitivity and specificity. Based on this, we can state that it is plausible that there is a high validity.
Finally, on the basis of sensitivity and specificity, you can calculate the correlation between the test results and the clinical diagnosis using the following formula:
r = (TP * TN - FP * FN) / √ ((TP + FP) * (TP + FN) * (TN + FP) * TN + FN))
with TP = true positives (true positive), TN = true negatives (true negatives), FP = false positives (false positives) and FN = false negatives (false negatives).
In our example, the correlation is:
r = (80 * 780 - 120 * 20) / √ ((80 + 120) + (80 + 20) + (780 + 120) + (780 + 20))
r = 60000/120000
r = .50
The correlation between the test results and the clinical diagnosis is therefore .50.
Guidelines and standards in the field
Another way to look at the correlations is to evaluate the context. Different requirements apply in one research field than in another. Things are found in physical science that are much more powerful than findings in behavioral science. According to the guidelines of Cohen (1988), correlations of 0.10 are considered small in psychology, correlations of 0.30 are considered medium and correlations of 0.50 are considered large. Nowadays Hemphill (2003) has made new guidelines. Now a correlation below 0.20 is small, between 0.20 and 0.30 is medium and above 0.30 the correlation is large.
Statistical significance
Statistical significance is an important part of inferential statistics. Inferential statistics are procedures that help us make decisions about populations. Most studies have a small number of participants. Most researchers use this small number of participants as an example for the entire population. And assume that this data is a good representation of the data that they would receive if they examined the entire population. Nevertheless, the researchers are aware that it is not possible to just say things about the entire population as is the case in the sample.
The inferential statistics are used to gain more confidence in statements about an entire population if only samples have been used. When a sample is statistically significant, it is representative of the population. If there is no statistical significance, then the correlations cannot accurately represent reality and the correlations may therefore have been obtained by chance. It is therefore logical that many researchers find statistical significance very important.
When evaluating convergent validity, it is expected that the validity coefficients are statistically significant. When evaluating discriminant validity, it is expected that the validity coefficients are not statistically significant. With statistical significance, the question arises: do we believe that there is a validity correlation (not zero) in the population from which the sample was taken? And how sure are we that it is? And are we sure enough to conclude that? Two factors that influence the questions are the size of the correlation in the sample and the size of the sample. Confidence rises when the correlation in the sample is not zero, but it can happen that the correlation in a sample is not zero while it can be zero over the entire population. A second factor is the size of the sample. The greater the number of test subjects, the greater the confidence in the sample. So larger correlations and larger samples increase the chance that a test is statistically significant.
Are we sure enough that the correlation in the population will not be zero? Researchers have found that a test with a confidence interval of 95% has statistical significance. So a test is statistically significant if there is a 5% chance of being wrong (this is the alpha level). It is possible that there are low correlations and that the test is nevertheless statistically significant or that there are high correlations and that the test is not statistically significant.
A non-significant convergent validity correlation may be due to a small correlation or a small sample. If the correlation is small, then this evidence is against the convergent validity of a test. If the correlation medium is to large, but the sample is small, then it is not necessarily the case that the convergent validity is poor. In this case the examination is poor because the sample was too small.
With discriminant validity, a high correlation provides evidence against the discriminant validity. A significant discriminant validity correlation can arise because the correlation is large or because the sample is large. If the correlation is large then this evidence is against the discriminant validity of a test. If the correlation is small, but the sample is large, then it does not necessarily mean that the discriminant validity is poor. In such cases the statistical significance says nothing and it is better to ignore it.
In the previous chapter we discussed the conceptual framework of validity, where we could identify five types of proof of validity. One of the types of evidence was convergent and divergent validity: the extent to which test scores have the "right" pattern of associations with other variables. This is discussed further in this chapter.
Add new contribution