|
Introduction and Purpose -
We recently developed a Vitalnet module for analyzing BRFSS data. The new program makes BRFSS data analysis much better, easier and more reliable. Due to the complex survey design, confidence intervals (CI) for BRFSS data are non-trivial. Several CI methods are available for BRFSS data. We chose "Jackknife Replication" (JR) mainly because 1) JR can calculate a CI for any outcome, including medians, and 2) JR seemed easier to understand and explain. Probably because of inability to run the computation-intensive JR method fast enough, most BRFSS analyses using SAS and SUDAAN typically use Taylor Series linearization (TS) instead.
Because we are using a different (though apparently equally valid)
method than the TSL commonly used, and because we needed to validate
the complex JR programming as best as possible, we
felt it was necessary to systematically compare a series of
Vitalnet CI calculations with ones calculated by
CDC WEAT,
and several prominent State systems.
It was also felt necessary to do a "reality check",
to verify that JR and TSL give the same or very similar results in practice.
We considered WEAT to be the current "gold standard", and used it as a benchmark.
Methods -
For each comparison, a CI was calculated for "% Yes" (or % No), for a particular
BRFSS variable analyzed by Vitalnet, WEAT, and one of two State systems, for 2005.
Each CI was calculated at the 95% level.
To provide more tests,
and because this analysis was available for each system,
the analysis was carried out for male and female.
A range of percents ended up being used, from small to large.
This would expose any difference for larger or smaller percents.
Since WEAT and some State systems were limited to one digit after the
decimal point, one digit level of precision was used for all tests.
An unlimited number of replicates was used for JR method.
The BRFSS variables were picked at random, and to ensure that all three
systems (Vitalnet, WEAT, and the State system) included the variable.
To select the systems included in the report,
we started with two that we knew existed
(A System
and
B System).
We tried to include
C System,
but is said "under reconstruction", as it had for at least some weeks.
Then, we did an internet search for "brfss data query", and picked the
ones highest in the results at the time (October, 2008).
We were not able to use the
D System,
which showed up in the google search;
It allowed some selection of parameters, but there
was consistently a "page not found" error when we clicked the button to run the query.
The next system found on the google search was the
E System,
which we initially included.
Later, we left out the E System, when we received information
the State re-weights their data to account for differences between State
population estimates and those used by the CDC.
The re-weighting made it not possible to directly compare weighted percents
and confidence intervals with their results.
Otherwise, once a comparison was done, it was included, regardless of the results.
The purpose of the report is to compare statistical methods, not to compare software systems.
Therefore, the States are not identified in this report.
Full information, including output with confidence intervals, is available on request.
Results and Discussion -
Each Table A-D below shows a comparison between WEAT, Vitalnet,
and a particular State system.
When only one system had a different CI, or when
a system had a markedly different result, the
differing number is marked in red.
Each Table E-F below shows a comparison between WEAT, Vitalnet,
and the B System, for small numbers.
The B System was the only State system where is was obvious
how to do a sub-analysis.
Very Similar Vitalnet and WEAT CI -
We found very close results between Vitalnet and WEAT.
For Table A-D (large numbers), only 3 of 24 confidence limits differed,
and only by 0.1% in each case.
Also for Tables A-D, Vitalnet (and WEAT) were the "odd man out"
(colored red) only 1 of 16 possible times.
State Systems CI not quite as Similar -
As shown in Tables A-D, the A System and B System
were close to Vitalnet and WEAT, but differed somewhat.
The A System was "odd man out" (colored red) 4 of 8 possible times.
The B System was "odd man out" (colored red) 5 of 8 possible times.
In practice, with large numbers, the differences in Tables A-D with the
A System and B System
would not affect the interpretation of results.
Why the Differences? -
Vitalnet uses jackknife replicates (JR) to calculate the CI, and the
procedure is
fully documented.
It is not suprising that Vitalnet differs a little from WEAT,
since WEAT uses Taylor series linearization (TSL), a totally different method.
If anything, the surprise is that WEAT and Vitalnet are so close.
Since none of the State programs give any details concerning how they calculate a CI,
there is no way to figure out why they differ somewhat more.
It is likely the State programs also use TSL,
since they are likely based on SAS or SUDAAN,
but there is no way to know without further investigation.
Small Numbers -
As shown in Tables E-F,
each Vitalnet and WEAT CI based on small numbers is also very close.
The difference varies from 0.0 to 0.3.
Each WEAT smaller number CI was somewhat smaller than the Vitalnet CI.
Or one could say the Vitalnet CI was somewhat larger.
At this point, there is no way to say which is "right".
From a theoretical point of view, JR would seem to be a more valid
method, since it is an exact non-parametric method.
However, the differences are so small to not seem significant.
The B System small number CIs were not as close to Vitalnet or WEAT,
as shown in Table E-F.
It is possible the small number CI differences between WEAT and Vitalnet
may be occurring with larger numbers of responses,
but are being obscured by the relatively smaller size of the CI,
and the fact that WEAT only allows one digit after the decimal point.
Conclusion -
The JR method produces valid confidence intervals for BRFSS data.
In practice, in terms of accuracy, JR and TSL appear to be equivalent.
Table A)
_RFBING3 (Binge Drinker)
System Results | Male | Female |
| % Yes | LL | UL | Yes | % Yes | LL | UL | Yes |
| A System | 23.2 | 21.0 | 25.5 | < 500 | 5.9 | 5.0 | 7.0 | < 500 |
| Vitalnet | 23.2 | 20.9 | 25.4 | < 500 | 5.9 | 5.0 | 6.9 | < 500 |
| WEAT | 23.2 | 21.0 | 25.4 | < 500 | 5.9 | 5.0 | 6.9 | < 500 |
Table B)
_BMI4CAT (Overweight)
System Results | Male | Female |
| % Yes | LL | UL | Yes | % Yes | LL | UL | Yes |
| A System | 72.4 | 70.0 | 74.7 | > 1000 | 55.6 | 53.5 | 57.6 | > 1000 |
| Vitalnet | 72.4 | 70.1 | 74.8 | > 1000 | 55.6 | 53.5 | 57.6 | > 1000 |
| WEAT | 72.4 | 70.1 | 74.8 | > 1000 | 55.6 | 53.6 | 57.6 | > 1000 |
Table C)
_RFSMOK3 (Current Smoker)
System Results | Male | Female |
| % Yes | LL | UL | Yes | % Yes | LL | UL | Yes |
| B System | 13.7 | 11.9 | 15.8 | < 500 | 9.3 | 8.1 | 10.6 | < 500 |
| Vitalnet | 13.7 | 11.7 | 15.7 | < 500 | 9.3 | 8.0 | 10.5 | < 500 |
| WEAT | 13.7 | 11.8 | 15.7 | < 500 | 9.3 | 8.0 | 10.5 | < 500 |
Table D)
_FV5SRV (Five a Day)
System Results | Male | Female |
| % No | LL | UL | No | % No | LL | UL | No |
| B System | 85.1 | 83.2 | 86.9 | > 1000 | 70.9 | 68.7 | 73.0 | > 1000 |
| Vitalnet | 85.1 | 83.3 | 87.0 | > 1000 | 70.9 | 68.7 | 73.0 | > 1000 |
| WEAT | 85.1 | 83.3 | 87.0 | > 1000 | 70.9 | 68.7 | 73.0 | > 1000 |
Table E)
_CHOLCHK (Chol. past 5 years),
45-49
System Results | Male | Female |
| % Yes | LL | UL | Yes | % Yes | LL | UL | Yes |
| B System | 73.3 | 65.3 | 80.0 | < 500 | 78.9 | 72.9 | 83.9 | < 500 |
| Vitalnet | 73.3 | 65.8 | 80.8 | < 500 | 78.9 | 73.4 | 84.5 | < 500 |
| WEAT | 73.3 | 65.9 | 80.7 | < 500 | 78.9 | 73.4 | 84.4 | < 500 |
Table F)
_CHOLCHK (Chol. past 5 years),
18-24
System Results | Male | Female |
| % Yes | LL | UL | Yes | % Yes | LL | UL | Yes |
| B System | 28.7 | 21.5 | 37.3 | < 100 | 32.8 | 25.7 | 40.8 | < 100 |
| Vitalnet | 28.7 | 20.5 | 36.9 | < 100 | 32.8 | 25.0 | 40.6 | < 100 |
| WEAT | 28.7 | 20.8 | 36.7 | < 100 | 32.8 | 25.2 | 40.4 | < 100 |
References consulted concerning TSL and JR methods include the following:
- Analysis of Health Surveys, Korn and Graubard (1999)
- Analysis of Survey Data, Chambers and Skinner (2003)
- Analyzing Complex Survey Data, Lee and Forthofer (2005)
- Introduction to Variance Estimation, Wolter (2007)
- Pitfalls of Using Standard Statistical Packages for Sample Survey Data, Brogan (1998)
- Practical Methods for Design and Analysis of Complex Surveys, Lehtonen and Pahkinen (1995)
- Sampling Error Estimation for Survey Data, Brogan (2005)
Abbreviations used in the tables and text are:
- CI = confidence interval
- LL = lower confidence limit
- UL = upper confidence limit
- TSL = Taylor series linearization
- JR = jackknife replication
This information last updated: Jan 12, 2009
|