Time Trend Analysis
The main purpose of this web page is to seek help from the statistical community concerning a better method to determine significance of a linear time trend.
If someone is plotting results over time, they normally want to know if there is a time trend. For example, "Is the death rate for disease X in County Y steady, going up or going down?". This is a very basic statistical task, and applies to many situations.
Vitalnet uses the least squares regression method to determine the time trend line. The significance of the trend is determined by calculating the confidence interval (CI) of the slope, at some confidence level (eg, 95%). If the CI includes 0, there is no significant trend. Otherwise, there is a significant trend. The following table shows the three general trend cases:
Least squares works fine if each data point is known with certainty. The problem is that least squares analysis takes no account of the variability at each data point. All least squares knows is the rate at each point. However, a rate based on many observations is certainly more stable than one based on just a few. For example, suppose there are two counties, Adams with 2,000 people, Brown with 200,000 people, and the population counts stay the same over time. Further, suppose the rates and case counts for disease X in the two counties are as follows:
Both counties would (appropriately) have exactly the same least squares line for the rates, since the least squares analysis is based on the rate data. Also, any trend (upward in this case) would be the same for both counties.
The problem is that the CI for the slope would also be identical for both counties. However, this doesn't make sense, because the rates in Adams County are obviously less stable, so the Adams slope CI should accordingly be larger.
I would think there is an existing method to determine the CI of the slope, taking into account the number of observations. However, I have read through standard statistical textbooks, as well as books focusing on trend analysis, and could not find this addressed, at least in a way that could be translated into a practical algorithm. Also, I have asked a few statisticians this basic question. So far, nobody has known the answer.
Please note that "use procedure X in SPSS / SAS / STATA" is probably not helpful, because a black box does not help. Unfortunately, all too often statisticians select from a "menu" of statistics, without understanding the underlying algorithm. Instead, I am looking for the basic math algorithm, with a practical example worked out, such as the practical example showing least squares. The same four-year data set used in the least squares example could be used. If you know how to construct a practical example showing how to take the number of cases into account when calculating the slope CI, or have any other suggestions / insights, please contact me. I will certainly acknowledge your assistance, and owe you a debt of gratitude.
Alternatively, if it turns out there is currently no known method to carry out this seemingly basic task, or if the knowledge of that method is too well hidden, I'm interested in devising and publishing a non-parametric method. I already have outlined to myself how to do that. If you are interested in possibly collaborating on this non-parametric method, please contact me.
Print-Friendly Version of this Page