Vitalnet Auto-Validation


Purpose - "Auto-Validation" is a quality assurance method to help verify that Vitalnet results are correct. It automatically compares thousands of historically correct results with those currently produced by the software. This page gives more detailed information concerning "Auto-Validation", beyond that given in the Main Auto-Validation Page. You can use use the results of Auto-Validation without reading this page. The information on this page is for those who want to understand the technical details.

Background / History - Historically, Vitalnet has been accurate and correct. There has always been an emphasis on ensuring correct results. Validation methods used have included (1) "Cross-Checking", (2) "Reality Check Tables", and (3) "Worksheet Validation". These validation methods continue to help ensure that data are correctly imported, that results are correct, and that something working last year still works correctly.

Method #1: "Cross-Checking" - This fundamental method "cross-checks" different Vitalnet tables. For example, switching rows and columns should give the same results in each cell, just with the axes reversed. Also, a shared cell between two different tables should agree (eg, black female deaths, year 2000, breast cancer, Anderson County, age 40-49). Also, results should agree with any published results, or with those produced by other methods, eg bash, SPSS, SAS, etc. There are many ways tables can be cross-checked to verify correctness.

Method #2: "Reality Check Tables" - This method uses a series of "Reality Check Tables". An example would be year columns, with each table set to a different "row variable". Another example would be a leading cause table, with year columns, and a few years. A new series of "Reality Check Tables" is compared with a series produced in years past, to check that the results match. The results may also be compared with results produced by other methods (eg, SPSS). This method does a reasonably good job finding obvious errors (eg, way too many or too few counts). However, the method does a poor job detecting small errors, and relies on error-prone visual scanning.

Method #3: "Worksheet Validation" - In 2015, Jeff Swanson, at Texas DSHS, devised a "Worksheet Validation" system. The system is within an Excel spreadsheet. Each worksheet tab contains three areas: the left area contains tables from a previously installed VitalPro module, the middle area contains tables produced by the new VitalPro module, and the right area does comparisons. Each cell in the right area corresponds to a data cell, blank if the left and middle numbers match, and 'X' if a discrepancy. "Worksheet Validation" works well. However, it requires manual steps to import new tables, and manual scanning to find discrepancies.

Method #4: "Auto-Validation" - This method, developed in 2016, better ensures correct Vitalnet results. It builds on ideas from "Reality Check Tables", "Worksheet Validation", and BRFSS data validation. "Auto-Validation" almost totally automates the process, by directly using Vitalnet output without manual steps, and by producing automated reports. Auto-Validation automatically compares two large sets of results: 1) previously verified "Gold Standard" results and 2) results currently produced by the program. Based on the comparison, Auto-Validation produces a report web page, such as this example.

"Gold Standard Results" (GS) - Eventually, as the result of "Cross Checking" and daily use, we can label some results "Gold Standards". For the purposes of Auto-Validation, each "Gold Standard" result usually has both rows and columns, to provide more numbers for comparison. It usually uses multi-tables (a table series), also providing more numbers to compare. If the "Gold Standard" includes rates, the population numbers for the years compared must be the same, to prevent a false discrepancy from occurring. For examples of "Gold Standard Results", refer to the example results page .

Directory Structure - "Auto-Validation" is file based, solely uses text files, and resides on a Unix computer. For each area / data combination (eg, Texas birth data), there is a "Check_Dir", accessible via the web. For example, the Check_Dir for Texas ICD-9 UC deaths is: /vn/avdir/txu1/eg1/2018-11-14/

Getting technical, in each Check_Dir, there are four INPUT subdirectories (subdirs):

subdir contents explanation
av-exp "expected" accepted "Gold Standard" (GS) results
av-obs "observed" results Vitalnet currently makes
av-par "parameters" parameters for "expected" and "observed" results
av-fix "fixup" explains if an discrepancy is accounted for

Again getting technical, there are two OUTPUT subdirs in each Check_Dir:

subdir contents explanation
av-dif "difference" observed vs expected comparison
av-adj "adjusted" adjusts av-dif output for lines such as "ID"

Expected Results - The av-exp (expected) subdir holds a series of "Gold Standard" files, accepted as correct. For each "Gold Standard" (GS), there are three files (txt, csv, htm), same content (such as counts and rates) in each file, just different format. For example:

GS expected txt, csv, and htm files
#1 txu1-eg1-001.txt txu1-eg1-001.csv txu1-eg1-001.htm
#2 txu1-eg1-002.txt txu1-eg1-002.csv txu1-eg1-002.htm

All three files are exactly as produced by Vitalnet. The txt file is the "Gold Standard" file used to make comparisons, but is not ideal for viewing. The csv and htm files may be imported into a spreadsheet, for convenient viewing. The htm file may also be viewed in a browser.

Observed Results - The av-obs (observed) subdir holds a series of files made by the current Vitalnet, using the same analysis parameters as the expected "Gold Standard" files. The files are named as in the av-exp subdir. For example:

GS observed txt, csv, and htm files
#1 txu1-eg1-001.txt txu1-eg1-001.csv txu1-eg1-001.htm
#2 txu1-eg1-002.txt txu1-eg1-002.csv txu1-eg1-002.htm

Parameter Helper Files - The av-par (parameter) input subdir serves a "helper" role. For each "Gold Standard" result, there is one file, such as txu1-eg1-001.txt file name. The file holds the multi-line parameters for the output, showing the input parameters used to generate one or more tables, as in this example.

Fixup Helper Files - The av-fix (fixup) input subdir also serves a "helper" role. Rarely, the "Gold Standard" is incorrect, as demonstrated by an independent method. In that case, the av-fix subdir would temporarily (until new Gold Standard created) hold a file, explaining why the observed result is correct, as in this example.

Differences - The av-dif (difference) output subdir holds the difference between "expected" and "observed". The Unix diff command compares "expected" with "observed" to produce the "difference". For each "Gold Standard", there is a single file with the difference, as in this example.

Adjusted Differences - For each "Gold Standard", the av-adj (adjusted) output subdir holds a single file with an "adjusted" form of the "av-dif" file. "False-positive" lines that are not data differences are removed from the adjusted file. For example, the unique ID will be different, so that is not a discrepancy, and hence needs to be adjusted for. If there are no lines in the adjusted file, there are no discrepancies, as in this example.

Generating Differences - For each "Gold Standard" result, a shell script compares the av-exp txt file with the av-obs txt file, writing diff output to a file in the av-dif subdir. Next, the script adjusts the av-dif file and writes it to the av-adj subdir. Finally, with input from the av-par and av-fix files, the shell script creates a report page, such as this example. Please refer to the Main Auto-Validation Page for how to interpret the report page.
 
Send corrections or suggestions