Investors use models of the movement of asset prices to predict where the price of an investment will be at any given time. The methods used to make these predictions are part of a field in statistics known as regression analysis. The calculation of the residual variance of a set of values is a regression analysis tool that measures how accurately the model's predictions match with actual values.
The regression line shows how the asset's value has changed due to changes in different variables. Also known as a trend line, the regression line displays the "trend" of the asset's price. The regression line is represented by a linear equation:
Y = a + bX
where "Y" is the asset value, "a" is a constant, "b" is a multiplier and "X" is a variable related to the asset value.
For instance, if the model predicts that a one-bedroom house sells for $300,000, a two-bedroom house sells for $400,000, and a three-bedroom house sells for $500,000, the regression line would look like:
Y= 200,000 + 100,000X
where "Y" is the home's selling price and "X" is the number of bedrooms.
Y = 200,000 + 100,000(1) = 300,000
Y = 200,000 + 100,000(2) = 400,000
Y = 200,000 + 100,000(3) = 500,000
A scatterplot shows the points that represent the actual correlations between the asset value and the variable. The term "scatterplot" comes from the fact that, when these points are plotted on a graph, they appear to be "scattered" around, rather than lying perfectly on the regression line. Using the example above, we could have a scatterplot with these data points:
Point 1: 1BR sold for $288,000
Point 2: 1BR sold for $315,000
Point 3: 2BR sold for $395,000
Point 4: 2BR sold for $410,000
Point 5: 3BR sold for $492,000
Point 6: 3BR sold for $507,000
Residual Variance Calculation
The residual variance calculation starts with the sum of squares of differences between the value of the asset on the regression line and each corresponding asset value on the scatterplot.
The squares of the differences are shown here:
Point 1: $288,000 - $300,000 = (-$12,000); (-12,000)2 = 144,000,000
Point 2: $315,000 - $300,000 = (+$15,000); (+15,000)2 = 225,000,000
Point 3: $395,000 - $400,000 = (-$5,000); (-5,000)2 = 25,000,000
Point 4: $410,000 - $400,000 = (+$10,000); (+10,000)2 = 100,000,000
Point 5: $492,000 - $500,000 = (-$8,000); (-8,000)2 = 64,000,000
Point 6: $507,000 - $500,000 = (+$7,000); (+7,000)2 = 49,000,000
Sum of the squares = 607,000,000
The residual variance is found by taking the sum of the squares and dividing it by (n-2), where "n" is the number of data points on the scatterplot.
RV = 607,000,000/(6-2) = 607,000,000/4 = 151,750,000.
Uses for Residual Variance
While every point on the scatterplot will not line up perfectly with the regression line, a stable model will have the scatterplot points in a regular distribution around the regression line. Residual variance is also known as "error variance." A high residual variance shows that the regression line in the original model may be in error. Some spreadsheet functions can show the process behind creating a regression line that fits closer with the scatterplot data.