One of the most basic concepts in statistics is the average, or arithmetic mean, of a set of numbers. The mean signifies a central value for the data set. The variance of a data set measures how far the elements of that data set are spread out from the mean. Data sets in which the numbers are all close to the mean will have a low variance. Those sets in which the numbers are much higher or lower than the mean will have a high variance.
Calculate Mean of the Data Set
Since the variance measures the amount of separation from the mean, the first step in finding the variance of a data set is to find its mean. For instance, a store calculates its daily revenues for seven days:
Day 1: $62,000
Day 2: $64,800
Day 3: $62,600
Day 4: $69,200
Day 5: $66,000
Day 6: $63,900
Day 7: $69,400
The mean for the store's daily revenues for the week is :
(62000+64800+62600+69200+66000+63900+69400)/7 = 457900/7 = $65,414.29
Calculate Squared Differences
The next step involves calculating the difference between each element in the data set and the mean. Since some elements will be higher than the mean and some will be lower, the variance calculation uses the square of the differences.
Day 1 Sales - Mean Sales: $62,000 - $65414.29 = (-$3,414.29); (-3,414.29)2 = 11,657,346.94
Day 2 Sales - Mean Sales: $64,800- $65414.29 = (-$614.29); (-614.29)2 = 377,346.94
Day 3 Sales - Mean Sales: $62,600 - $65414.29 = (-$2,814.29); (-2,814.29)2 = 7,920,204.08
Day 4 Sales - Mean Sales: $69,200 - $65414.29 = (+$3,785.71); (+3,785.71)2 = 14,331,632.65
Day 5 Sales - Mean Sales: $66,000 - $65414.29 = (+$585.71); (+585.71)2 = 343,061.22
Day 6 Sales - Mean Sales: $63,900 - $65414.29 = (-$1,514.29); (-1,514.29)2 = 2,293,061.22
Day 7 Sales - Mean Sales: $69,400 - $65414.29 = (+$3,985.71); (+3,985.71)2 = 15,885,918.37
NOTE: The squared differences are not measured in dollars. These numbers are used in the next step to calculate the variance.
Variance and Standard Deviation
The variance is defined as the mean of the squared differences.
11,657,346.94 + 377,346.94 + 7,920,204.08 + 14,331,632.65 + 343,061.22 + 2,293,061.22 + 15,885,918.37 = 52,808,571.43
52,808,571.43/7 = 7,544,081.63
Since the variance uses the square of the difference, the square root of the variance will give a clearer indication of the actual spread. In statistics, the square root of the variance is called the standard deviation.
SQRT(7,544,081.63) = $2,746.65
Uses for Variance and Standard Deviation
Both variance and standard deviation are highly useful in statistical analysis. The variance measures the overall spread of a data set from the mean. The standard deviation helps in detecting outliers, or elements of the data set that stray too far from the mean.
In the data set above, the variance is quite high, with only two daily sales totals coming to within $1,000 of the mean. The data set also shows that two of the seven daily sales totals are more than one standard deviation above the mean, while two others are more than one standard deviation below the mean.