Article summary of Classification and predictive discriminant analysis by De Heus - Chapter

How can we go from a dimension to a classification?
How does the basic process of classification go?
What is discriminant analysis (DA)?
Which problems do we need to solve in order to predict to which group an individual belongs?
How do we determine how accurate a prediction is?
What is Bayes' rule?

Psychometric research is conducted in the hope of making meaningful statements about ourselves and others. There are two kinds of judgments we can make. First, there are dimensional judgments, which look at our position compared to others on a particular dimension. In addition, we can also use classification, meaning we determine to which category we belong.

How can we go from a dimension to a classification?

There are several ways we can categorize people. The general procedure for moving from a dimensional judgment to a classification starts with a research group. This group is a sample for which we know the scores on each dimension for each individual. Within such a sample, an attempt is made to determine to which group an individual belongs based on a dimensional judgment. Although we actually already know the classification, this ensures that we have a prediction rule that we can use for new individuals and that we have information about how well the prediction rule works. Then, if we find that the prediction rule works well enough, we can use it to classify new individuals from which we do not know to which group they belong.

Although the process sounds simple in this way, in practice it does not work very easily. For example, we often doubt the reliability and validity of a sample or this particular procedure, but we still base our prediction rule on this. Next, there are different criteria for determining how accurate a prediction is, which can contradict each other. Lastly, applying a prediction rule to a new group can lead to unexpected results.

How does the basic process of classification go?

We can make the easiest dimensional classification when we look at two groups and base our classification on just one dimension. With this data we can perform a t-test to see if there is a significant difference between the means of the groups. However, by showing that a certain interval variable (such as depression) is related to a nominal variable with two categories (depression status) we have not solved the classification problem. We do not want to predict from nominal to interval variables, but from interval to nominal variables. If only one interval predictor is used, a cut-off point can be established. A cut-off point is used to determine above which point people receive a positive diagnosis, and below which point they will receive a negative diagnosis.

In reality, this prediction rule will not work perfectly because there is overlap between the distributions of the two groups on the interval variable. This means that we can always make two kinds of mistakes. The first one is a false positive, where for example someone does not have depression but is classified as having it. The second one is false negatives, where for example depressed people are classified as not depressed.

How do we determine the cut-off score?

Which cut-off score we use depends on the extent to which we consider both types of errors bad. If we find both types of error equally bad and the groups have the same symmetric distribution with the same standard deviation, the point will be exactly between the two group means. If we find false positives worse and want to eliminate them, we run the risk of finding more false negatives and vice versa. This shows that we always have to make certain choices if we want to determine the cut-off score. The situation becomes even more complex if we want to compare more than two groups on different dimensions. In that case, a discriminant analysis is often performed.

What is discriminant analysis (DA)?

The aim of discriminant analysis is to predict as accurately as possible to which group a certain person belongs by using a certain number of interval variables (>2). We can look at the differences in two ways: from the group perspective and the individual perspective. In the first case, we try to describe the nature of the differences between groups, which is called descriptive discriminant analysis. In addition, we can also take the individual as a starting point and use the scores on the interval variables to predict to which group the person belongs, which is known as predictive discriminant analysis. Here, the emphasis is on the latter variant.

The first question to ask is whether our prediction is meaningful. DA leads to an optimal (best possible) prediction of the nominal variable based on the interval variables. To see if the prediction is meaningful, you can check whether the best possible prediction is better than you would expect at the probability level, using Wilk's Lambda. If this test is not significant, we cannot say anything useful about which group someone would belong to based on the interval variables, and the prediction is not meaningful. It is important to note that a significant result does not guarantee an accurate prediction. You sometimes find several Wilk's Lambdas in SPSS output: in that case you have to use the top one, as this is more accurate.

In the context of a predictive DA, you don't need to know how groups differ (if individual classification is your goal). However, as psychologists, we often also want to know how and why these predictions work. A rough, but fairly effective method is to compare the means on the interval variables. A major shortcoming of this approach is that it does not take into account intercorrelations between predictors, which can lead to misleading conclusions. To solve this problem, you can use descriptive discriminant analysis.

Which problems do we need to solve in order to predict to which group an individual belongs?

Calculating the most likely group membership for each possible individual is a problem that does not have a particular optimal solution that is best in all situations. A possible strategy for this is to look at both individuals and group means on p variables in a p-dimensional space. In this space we can calculate the differences between each individual point and all group means (using the Pythagorean theorem). Then each individual is counted as part of the group to which this person has the shortest distance. You can also apply this method if there are more than two variables, although you can no longer represent it in spatial terms if you use more than three variables.

In order to be able to use an effective method to predict which group an individual belongs to, a number of problems must be solved:

If there are differences in standard deviations (SD) between variables, variables with a high SD will disproportionately affect the calculated distances. The solution to this problem is standardization (through Z-scores).
When variables are correlated with each other, the variance that the variables share has a disproportionate impact on the distances, even if all variables are standardized. The solution to this problem is to work within a standardized component space or in the space of discriminant function variates.
If there are differences between groups in variability around the mean, homogeneous groups will require a shorter distance from the group than heterogeneous groups. This can be solved by weighting the distances of certain group points based on the group their SDs.
The boundaries between the groups do not necessarily have to be linear. With linear DA, this cannot be discovered and cannot be used as an optimal classification for individuals.

How do we determine how accurate a prediction is?

To determine how accurate a DA's prediction is, a classification table is used. This is a table in which the predicted and observed values are plotted against each other, the cells of which contain the frequencies of all possible combinations. A general measure of the quality of the prediction is the percent accuracy in classification (PAC). This is the number of correct predictions / total number of predictions. In many cases, a general measure such as the PAC is not precise enough, because all errors are looked at together. With regard to more specific measures of the quality of the prediction, a distinction can be made between the quality of the instrument (sensitivity and specificity) and the quality of the individual diagnosis (positive and negative predictive value).

How do we determine the quality of an instrument?

In order to determine the quality of an instrument, it is important to determine how likely it is that an individual from a certain group will also be identified as a member of that group, which is called sensitivity. This is the number of good predictions sick in group A / total number of predictions sick in group A. A high degree of sensitivity increases false positives and decreases specificity and true negatives. Another measure of the quality of an instrument is specificity: the number of accurate predictions not ill in group A / total number of predictions not ill in group A.

Both sensitivity and specificity are conditional probabilities. This refers to the probability of event A if we know that another event (B) has occurred.

How do we determine the quality of individual diagnosis?

If you want to make a diagnosis for a certain individual, sensitivity and specificity are not useful to assess the quality. This is because you do not want to go from real situation (Y) to the conditional probability of a specific diagnosis (X), but from a certain diagnosis (X) to the conditional probability of a real situation. Instead, you can use the positive predictive value. This is the percentage of individuals with a positive diagnosis who also belong to the target group. You can also use the negative predictive value, meaning the percentage of individuals with a negative diagnosis who also belong to the target group. These concepts can all be used with multiple groups, which also offers the opportunity to answer multiple questions.

What is Bayes' rule?

If we subject a test battery to a predictive discriminant analysis, it is nice to work with groups of approximately equal size, because our predictions will have maximum precision and statistical power. In reality, groups are usually not equal in the population.

When you move from the original study group to the population, the sensitivity and specificity do not change, but the positive and negative predictive values do. If the distribution is skewed (e.g. a disease is very rare) there will be more false positives than true positives. To take this into account, you can use Bayes' theorem.

In discriminant analysis it is important to take the relative seriousness of the errors that can be made into account, as well as the relative frequencies of the groups to be predicted in the population, known as the base rate.

Access:

Public

Click & Go to more related summaries or chapters:

Summaries per article with Psychometrics at Leiden University 21/22

Article summary of Principal component analysis by De Heus & Van der Leeden - Chapter

Article summary of Adapting models with confirmatory factor analysis by De Heus - Chapter

Article summary of Classification and predictive discriminant analysis by De Heus - Chapter

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Going abroad?

Insure your way around the world

International expat insurances

Travel & Worldsupporter insurances (NL)

Study with summaries

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Spotlight: topics

Check how to use summaries on WorldSupporter.org

Submenu: Summaries & Activities

Follow the author: Vintage Supporter

Work for WorldSupporter

JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics

Search a summary, study help or student organization

Select any filter and click on Search to see results

Article summary of Classification and predictive discriminant analysis by De Heus - Chapter

How can we go from a dimension to a classification?

How does the basic process of classification go?

How do we determine the cut-off score?

What is discriminant analysis (DA)?

Which problems do we need to solve in order to predict to which group an individual belongs?

How do we determine how accurate a prediction is?

How do we determine the quality of an instrument?

How do we determine the quality of individual diagnosis?

What is Bayes' rule?

Summaries per article with Psychometrics at Leiden University 21/22

Contributions: posts

Add new contribution

Spotlight: topics

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

Quicklinks to fields of study for summaries and study assistance