Least Squares Regression Method

Vitalnet uses the "least squares" regression method to determine the time trend line. Therefore, to help users better understand time trend analysis, this page explains least squares.

Figure 1 |

Figure 2 |

Year | Year Index (X axis) |
Crude rate (per 100,000) (Y axis) |
Deaths |
---|---|---|---|

2001 | 0 | 25.533 | 5445 |

2002 | 1 | 25.941 | 5650 |

2003 | 2 | 25.603 | 5663 |

2004 | 3 | 24.126 | 5426 |

The standard method for determining the Y-intercept, the slope, and CI of the slope is described in "Statistics", 2nd edition, by Murray Spiegel, pages 296 and 319. The method, as adapted for use within Vitalnet, is as follows:

· SS == sum of squares

· CL == confidence level

· SD == standard deviation

· YI == Y-intercept

· yrRngs == year ranges (4)

· df == degrees of freedom

· numer == numerator

· denom == denominator

· yrRngs (number of year ranges) = 4

· sumX (sum of X values) = 0 + 1 + 2 + 3 = 6

· sumXX (SS of X values) = 0 + 1 + 4 + 9 = 14

· avgX (average X value) = 6 / 4 = 1.5

· avgXX (average X squared value) = 14 / 4 = 3.5

· df (degrees of freedom) = yrRngs - 2 = 2

· sumY (sum of Y values) = 25.533 + 25.941 +

25.603 + 24.126 = 101.203

· sumYY (SS for Y) = 651.934 + 672.936 +

655.514 + 582.064 = 2562.447

· sumXY (sum of XY values) = 0 + 25.941 +

51.206 + 72.378 = 149.525

· tVal (t Value) = 4.302656 (2 df and 95% CL)

· varX (variance of X vals) = avgXX - (avgX * avgX)

· varX = 3.5 - 2.25 = 1.250

· stdDevX (SD of X values) = sqrt (varX) = 1.118

· numer = (sumY * sumXX) - (sumX * sumXY)

· denom = (yrRngs * sumXX) - (sumX * sumX)

· YI = numer / denom

· numer = (101.203 * 14) - (6 * 149.525) = 519.692

· denom = (4 * 14) - (6 * 6) = 20

· YI = 519.692 / 20 = 25.985

· numer = (yrRngs * sumXY) - (sumX * sumY)

· denom = (yrRngs * sumXX) - (sumX * sumX)

· slope = numer / denom

· numer = (4 * 149.525) - (6 * 101.203) = -9.118

· denom = (4 * 14) - (6 * 6) = 20

· slope = -9.118 / 20 = -0.456

· numer = sumYY - (YI * sumY) - (slope * sumXY)

· varOfEstimateOfYOnX = numer / yrRngs

· numer = 2562.447 - (25.985 * 101.203) -

(-0.456 * 149.525) = 0.870

· varOfEstimateOfYOnX = 0.870 / 4 = 0.218

· stdErrOfEstimateOfYOnX =

sqrt (varOfEstimateOfYOnX) = 0.466

· numer = tVal * stdErrOfEstimateOfYOnX

· denom = sqrt (df) * stdDevX

· halfInterval = numer / denom

· slopeLoLimit = slope - halfInterval

· slopeHiLimit = slope + halfInterval

· numer = 4.303 * 0.466 = 2.005

· denom = 1.414 * 1.118 = 1.581

· halfInterval = 2.005 / 1.581 = 1.268

· slopeLoLimit = -0.454 - 1.268 = -1.722

· slopeHiLimit = -0.454 + 1.268 = +0.814

· CI of the slope = -1.722 to +0.814

· numer = (yrRngs * sumXY) - (sumX * sumY)

· numer = (4 * 149.525) - (6 * 101.203) = -9.118

· xTerm = (yrRngs * sumXX) - (sumX * sumX)

· yTerm = (yrRngs * sumYY) - (sumY * sumY)

· xTerm = (4 * 14) - (6 * 6) = 20

· yTerm = (4 * 2562.447) - (101.203^2) = 7.741

· denom = xTerm * yTerm = 20 * 7.741 = 154.800

· co_of_determ = (numer * numer) / denom

· co_of_determ = (-9.118 * -9.118) / 154.800 = 0.537

The coefficient of determination (COD) is a measure of goodness of fit. If the data points fall on a straight line, the least squares line will be the same line, and COD = 1. If the data points are randomly distributed, COD = 0.

Vitalnet results for the above example. Some of the numbers may slightly differ, due to lower precision in the hand calculations.

In summary, the least squares finds the line considered to best fit the data, and determines if there is a significant upward or downward trend. However, as discussed elsewhere on this web site, the slope CI is incorrect when the data points are based on few observations. Thus, a better method for determining the slope CI is sought.