Variability, Variance and Standard Deviation
The variability of a distribution refers to the extent to which scores are spread or clustered. Variability provides a quantitative value to the extent of difference between scores. A large value refers to high variability. The aim of measuring variability is twofold:
Describing the distance than can be expected between scores;
Measuring the representativeness of a scores for the whole distribution.
The range of a measurement is the distance between the highest and lowest score. The lowest score should be subtracted from the highest score. However, the range can provide a wrong image when there are extreme values present. Thus, the disadvantage of the range is that it does not account for all values, but only for the extreme values.
The standard deviation (SD) is the most frequently used and most important measure for spread. This measurement uses the mean of the distribution as comparison point. Moreover, the standard deviation uses the distance between individual scores and the mean of the data set. By using the standard deviation, you can check whether individual scores in general are far away or close to the mean. The standard deviation can be best understood by means of four steps:
First, the deviation of each individual score to the mean has to be calculated. The deviance is the difference between each individual score and the mean of the variable. The formula is:
\[deviation\: score = x - µ\]
- x: individual score of x
- μ: mean of the variable
In the next step, calculate the mean of the deviation scores. This can be obtained by adding all deviations scores and dividing the sum by the number of deviation scores (N). The deviation scores are combined always zero. Before computing the mean, each deviation score should be placed between brackets and squared.
\[mean\:of\:the\:deviation\:scores = \frac{\sum{(x-\mu)}}{N}\]
- x: individual score of x
- μ: mean of the variable
- N: number of deviation scores
Next, the mean of the squared sum can be computed. This is called the variance. The formula of the variance is:
\[σ^2= \frac{\sum {(x-μ)^{2}}}{N}\]
- σ2: squared sum or variance
- x: individual score of x
- μ: mean of the variable
- N: number of deviation scores
- Finally, draw the square root of the variance. The result is the standard deviation. The final formula for the standard deviation is thus:
\[σ= \sqrt {\frac{\sum {(x-μ)^{2}}}{N}}\]
- σ: standard deviation
- x: individual score of x
- μ: mean of the variable
- N: number of deviation scores
Often, the variance is a large and unclear number, because it comprises a squared number. It is therefore useful and easier to understand to compute and present the standard deviation.
In a sample with n scores, the first n-1 scores can vary, but the last score is definite. The sample consists of n-1 degrees of freedom (in short: df).
The total variance can be subdivided into 1) systematic variance and 2) error variance.
Systematic variance refers to that part of the total variance that can predictably be related to the variables that the researcher examines.
Error variance emerges when the behavior of participants is influenced by variables that the researcher does not examine (did not include in his or her study) or by means of measurement error (errors made during the measurement). For example, if someone scores high on aggression, this may also be explained by his or her bad mood instead of the temperature. This form of variance can not be predicted in the study. The more error variance is present in a data set, the harder it is to determine if the manipulated variables (independent variables) actually are related to the behavior one wants to examine (the dependent variable). Therefore, researchers try to minimize the error variance in their study.