Expert Health Data Programming, Inc. Home   Expert Health Data Programming
  We care about your health data
Contact Us  

Photograph above Navigation

Jenks Natural Breaks Compared with Other Methods


 
Purpose: To help Vitalnet users better understand the differences between different methods for setting map ranges, this page compares "Natural Breaks" (NB) and two alternative range algorithms: "Equal Counts" (quantiles) (EC) and "Equal Intervals" (EI).
 
Method and Results: A series of commonly made maps was used for the comparisons, using three to five ranges. While somewhat arbitrary, these are typical maps that would be made, and serve as a useful way to compare the three methods. The maps use the "Diverging Green-Yellow-Red" color palette. The "Sequential Grey" palette produced similar findings. GVF (goodness of variance fit), the distribution of the underlying data, and visual inspection were used to assess the maps. The statistics (GVF and other) for each map may be viewed by viewing the HTML source. Of course, there is nothing "magical" about GVF scores. Possibly other variance measures could be used. But GVF is a reasonable variance indicator.
 
Map Comparisons: Each example below has our comments (in parentheses). At the very end are some conclusions, our thoughts about natural breaks.
 


TX 2005 Deaths, 4 Ranges
 

Equal Counts
GVF = 0.192; 1 s

Equal Intervals
GVF = 0.880; 1 s

Natural Breaks
GVF = 0.956; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)
 


TX 2005 Age Adjusted Death Rate, 3 Ranges
 

Equal Counts
GVF = 0.578; 1 s

Equal Intervals
GVF = 0.636; 1 s

Natural Breaks
GVF = 0.655; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, 4 Ranges
 

Equal Counts
GVF = 0.637; 1 s

Equal Intervals
GVF = 0.650; 1 s

Natural Breaks
GVF = 0.827; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, 5 Ranges
 

Equal Counts
GVF = 0.678; 1 s

Equal Intervals
GVF = 0.729; 1 s

Natural Breaks
GVF = 0.899; 8 s
EC result
EC
EI result
EI
NB result
NB

 
("Natural Breaks" best by far. Extreme rate in Kenedy County distorts map.)
 


TX 2005 Age Adjusted Death Rate, 3 Ranges, Suppress if ≤ 10 Deaths
 

Equal Counts
GVF = 0.749; 1 s

Equal Intervals
GVF = 0.663; 1 s

Natural Breaks
GVF = 0.766; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, 4 Ranges, Suppress if ≤ 10 Deaths
 

Equal Counts
GVF = 0.811; 1 s

Equal Intervals
GVF = 0.814; 1 s

Natural Breaks
GVF = 0.858; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, 5 Ranges, Suppress if ≤ 10 Deaths
 

Equal Counts
GVF = 0.851; 1 s

Equal Intervals
GVF = 0.884; 1 s

Natural Breaks
GVF = 0.904; 7 s
EC result
EC
EI result
EI
NB result
NB

 
("Natural Breaks" still best. Suppressing Kenedy County improves map.)
 


TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges
 

Equal Counts
GVF = 0.373; 1 s

Equal Intervals
GVF = 0.697; 1 s

Natural Breaks
GVF = 0.826; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges, Suppress if ≤ 5 Deaths
 

Equal Counts
GVF = 0.600; 1 s

Equal Intervals
GVF = 0.654; 1 s

Natural Breaks
GVF = 0.840; 1 s
EC result
EC
EI result
EI
NB result
NB

 
("Natural Breaks" best by far, with or without suppression.)
 


TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges
 

Equal Counts
GVF = 0.776; 1 s

Equal Intervals
GVF = 0.749; 1 s

Natural Breaks
GVF = 0.863; 1 s
EC result
EC
EI result
EI
NB result
NB

 
TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges, Suppress if ≤ 5 Deaths
 

Equal Counts
GVF = 0.786; 1 s

Equal Intervals
GVF = 0.862; 1 s

Natural Breaks
GVF = 0.891; 1 s
EC result
EC
EI result
EI
NB result
NB

 
("Natural Breaks" best, with or without suppression.)
 


CA 2000 Births, 4 Ranges
 

Equal Counts
GVF = 0.331; 1 s

Equal Intervals
GVF = 0.900; 1 s

Natural Breaks
GVF = 0.974; 1 s
EC result
EC
EI result
EI
NB result
NB

 
CA 2000 Births, Age 15-19, 4 Ranges
 

Equal Counts
GVF = 0.329; 1 s

Equal Intervals
GVF = 0.811; 1 s

Natural Breaks
GVF = 0.985; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)
 


CA 2000 Births per 1,000 Females, Age 15-19, 3 Ranges
 

Equal Counts
GVF = 0.858; 1 s

Equal Intervals
GVF = 0.897; 1 s

Natural Breaks
GVF = 0.897; 1 s
EC result
EC
EI result
EI
NB result
NB

 
CA 2000 Births per 1,000 Females, Age 15-19, 4 Ranges
 

Equal Counts
GVF = 0.916; 1 s

Equal Intervals
GVF = 0.921; 1 s

Natural Breaks
GVF = 0.937; 1 s
EC result
EC
EI result
EI
NB result
NB

 
CA 2000 Births per 1,000 Females, Age 15-19, 5 Ranges
 

Equal Counts
GVF = 0.952; 1 s

Equal Intervals
GVF = 0.961; 1 s

Natural Breaks
GVF = 0.963; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Very uniform data distribution. All three methods work very well.)
 


CA 2000 Cesarean Rate, 3 Ranges
 

Equal Counts
GVF = 0.705; 1 s

Equal Intervals
GVF = 0.702; 1 s

Natural Breaks
GVF = 0.770; 1 s
EC result
EC
EI result
EI
NB result
NB

 
CA 2000 Cesarean Rate, 4 Ranges
 

Equal Counts
GVF = 0.792; 1 s

Equal Intervals
GVF = 0.805; 1 s

Natural Breaks
GVF = 0.851; 1 s
EC result
EC
EI result
EI
NB result
NB

 
CA 2000 Cesarean Rate, 5 Ranges
 

Equal Counts
GVF = 0.834; 1 s

Equal Intervals
GVF = 0.898; 1 s

Natural Breaks
GVF = 0.901; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Relatively uniform data distribution. "Natural Breaks" is best map.)
 


US 2005 % Obese, 18+, 4 Ranges
 

Equal Counts
GVF = 0.915; 1 s

Equal Intervals
GVF = 0.915; 1 s

Natural Breaks
GVF = 0.925; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Very uniform data distribution. All three methods work very well.)
 


US 2005 % Current Smoker, 18+, 4 Ranges
 

Equal Counts
GVF = 0.817; 1 s

Equal Intervals
GVF = 0.873; 1 s

Natural Breaks
GVF = 0.883; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Relatively uniform data distribution. "Equal Counts" has low GVF score.)
 


US 2005 % Satisfied with Life, 18+, 4 Ranges
 

Equal Counts
GVF = 0.898; 1 s

Equal Intervals
GVF = 0.901; 1 s

Natural Breaks
GVF = 0.914; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Very uniform data distribution. All three methods work very well.)
 


US 2005 % Leisure Time Exercise, 18+, 4 Ranges
 

Equal Counts
GVF = 0.791; 1 s

Equal Intervals
GVF = 0.854; 1 s

Natural Breaks
GVF = 0.906; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Extreme value in Puerto Rico. "Natural Breaks" works best.)
 


Iowa 2005 Cancer Cases, 4 Ranges
 

Equal Counts
GVF = 0.386; 1 s

Equal Intervals
GVF = 0.863; 1 s

Natural Breaks
GVF = 0.942; 1 s
EC result
EC
EI result
EI
NB result
NB

 
(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)
 


Iowa 2005 Cancer Incidence Rate, 3 Ranges
 

Equal Counts
GVF = 0.737; 1 s

Equal Intervals
GVF = 0.787; 1 s

Natural Breaks
GVF = 0.813; 1 s
EC result
EC
EI result
EI
NB result
NB

 
Iowa 2005 Cancer Incidence Rate, 4 Ranges
 

Equal Counts
GVF = 0.829; 1 s

Equal Intervals
GVF = 0.824; 1 s

Natural Breaks
GVF = 0.878; 1 s
EC result
EC
EI result
EI
NB result
NB

 
Iowa 2005 Cancer Incidence Rate, 5 Ranges
 

Equal Counts
GVF = 0.876; 1 s

Equal Intervals
GVF = 0.909; 1 s

Natural Breaks
GVF = 0.921; 1 s
EC result
EC
EI result
EI
NB result
NB

 
("Natural Breaks" is best map. But not big differences.)
 


Conclusions: Our overall conclusions, based on the maps:
 
(1) Use "Natural Breaks" with count data maps. Count data (eg, births, deaths) are typically greatly skewed (a few areas with very high counts). So "Equal Count" maps poorly display count data, and have extremely low GVF scores. For the same reason, "Equal Interval" maps with count data also have relatively low GVF scores. So "Natural Breaks" is almost always the best choice for mapping count data.
 
(2) Map appearance can differ greatly, EC vs IE vs NB. Most maps have a non-uniform data distribution. In those cases, the appearance will be quite different using the three different methods.
 
(3) Sometimes the appearance is the same, EC vs IE vs NB. If the data are pretty much linearly distributed, from low to high, the three methods will produce about the same results. CA births per 1,000 females, age 15-19 is a good example.
 
(4) Cell suppression helps greatly if low # of events. A rate based upon a small number of events is unstable, and will result in some extremely high rates (with high confidence interval). If suppressing a few areas eliminates extreme values, it usually increases the GVF score, and produces a map that is easier to interpret by removing "noise".
 
(5) Overall, "Natural Breaks" is better for rate maps. Both "Equal Counts" and "Equal Intervals" can easily do poorly when there is a non-linear distribution of values, with resulting low GVF scores and potentially misleading maps. A good example is "TX 2005 Age Adjusted Death Rate, Cancer".
 
(6) "Equal Counts" can be cosmetically appealing. "Equal Counts" uses each color equally. So it usually produces a more "colorful" map, perhaps for use in an atlas with more visual than epidemiological purpose. But more "colorful" does not translate to more "meaningful".
 
(7) Look at the underlying data. It is a good idea to make all three map types, and look at the actual data distribution, before publishing a map. Normally, it is best to let the numbers "speak for themselves". That is a good argument to normally use "Natural Breaks", which always produces the best GVF score.
 
Let us know if (1) you have thoughts or suggestions about natural breaks, the map examples, or other methods, or (2) you are interested in working together on further research on this topic, for publication in the peer-review literature.
 
Print-Friendly Version of this Page