# Jenks Natural Breaks Compared with Other Methods

Purpose: To help Vitalnet users better understand the differences between different methods for setting map ranges, this page compares "Natural Breaks" (NB) and two alternative range algorithms: "Equal Counts" (quantiles) (EC) and "Equal Intervals" (EI).

Method and Results: A series of commonly made maps was used for the comparisons, using three to five ranges. While somewhat arbitrary, these are typical maps that would be made, and serve as a useful way to compare the three methods. The maps use the "Diverging Green-Yellow-Red" color palette. The "Sequential Grey" palette produced similar findings. GVF (goodness of variance fit), the distribution of the underlying data, and visual inspection were used to assess the maps. The statistics (GVF and other) for each map may be viewed by viewing the HTML source. Of course, there is nothing "magical" about GVF scores. Possibly other variance measures could be used. But GVF is a reasonable variance indicator.

Map Comparisons: Each example below has our comments (in parentheses). At the very end are some conclusions, our thoughts about natural breaks.

TX 2005 Deaths, 4 Ranges

 Equal Counts GVF = 0.192; 1 s Equal Intervals GVF = 0.880; 1 s Natural Breaks GVF = 0.956; 1 s EC EI NB

(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)

TX 2005 Age Adjusted Death Rate, 3 Ranges

 Equal Counts GVF = 0.578; 1 s Equal Intervals GVF = 0.636; 1 s Natural Breaks GVF = 0.655; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, 4 Ranges

 Equal Counts GVF = 0.637; 1 s Equal Intervals GVF = 0.650; 1 s Natural Breaks GVF = 0.827; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, 5 Ranges

 Equal Counts GVF = 0.678; 1 s Equal Intervals GVF = 0.729; 1 s Natural Breaks GVF = 0.899; 8 s EC EI NB

("Natural Breaks" best by far. Extreme rate in Kenedy County distorts map.)

TX 2005 Age Adjusted Death Rate, 3 Ranges, Suppress if ≤ 10 Deaths

 Equal Counts GVF = 0.749; 1 s Equal Intervals GVF = 0.663; 1 s Natural Breaks GVF = 0.766; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, 4 Ranges, Suppress if ≤ 10 Deaths

 Equal Counts GVF = 0.811; 1 s Equal Intervals GVF = 0.814; 1 s Natural Breaks GVF = 0.858; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, 5 Ranges, Suppress if ≤ 10 Deaths

 Equal Counts GVF = 0.851; 1 s Equal Intervals GVF = 0.884; 1 s Natural Breaks GVF = 0.904; 7 s EC EI NB

("Natural Breaks" still best. Suppressing Kenedy County improves map.)

TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges

 Equal Counts GVF = 0.373; 1 s Equal Intervals GVF = 0.697; 1 s Natural Breaks GVF = 0.826; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges, Suppress if ≤ 5 Deaths

 Equal Counts GVF = 0.600; 1 s Equal Intervals GVF = 0.654; 1 s Natural Breaks GVF = 0.840; 1 s EC EI NB

("Natural Breaks" best by far, with or without suppression.)

TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges

 Equal Counts GVF = 0.776; 1 s Equal Intervals GVF = 0.749; 1 s Natural Breaks GVF = 0.863; 1 s EC EI NB

TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges, Suppress if ≤ 5 Deaths

 Equal Counts GVF = 0.786; 1 s Equal Intervals GVF = 0.862; 1 s Natural Breaks GVF = 0.891; 1 s EC EI NB

("Natural Breaks" best, with or without suppression.)

CA 2000 Births, 4 Ranges

 Equal Counts GVF = 0.331; 1 s Equal Intervals GVF = 0.900; 1 s Natural Breaks GVF = 0.974; 1 s EC EI NB

CA 2000 Births, Age 15-19, 4 Ranges

 Equal Counts GVF = 0.329; 1 s Equal Intervals GVF = 0.811; 1 s Natural Breaks GVF = 0.985; 1 s EC EI NB

(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)

CA 2000 Births per 1,000 Females, Age 15-19, 3 Ranges

 Equal Counts GVF = 0.858; 1 s Equal Intervals GVF = 0.897; 1 s Natural Breaks GVF = 0.897; 1 s EC EI NB

CA 2000 Births per 1,000 Females, Age 15-19, 4 Ranges

 Equal Counts GVF = 0.916; 1 s Equal Intervals GVF = 0.921; 1 s Natural Breaks GVF = 0.937; 1 s EC EI NB

CA 2000 Births per 1,000 Females, Age 15-19, 5 Ranges

 Equal Counts GVF = 0.952; 1 s Equal Intervals GVF = 0.961; 1 s Natural Breaks GVF = 0.963; 1 s EC EI NB

(Very uniform data distribution. All three methods work very well.)

CA 2000 Cesarean Rate, 3 Ranges

 Equal Counts GVF = 0.705; 1 s Equal Intervals GVF = 0.702; 1 s Natural Breaks GVF = 0.770; 1 s EC EI NB

CA 2000 Cesarean Rate, 4 Ranges

 Equal Counts GVF = 0.792; 1 s Equal Intervals GVF = 0.805; 1 s Natural Breaks GVF = 0.851; 1 s EC EI NB

CA 2000 Cesarean Rate, 5 Ranges

 Equal Counts GVF = 0.834; 1 s Equal Intervals GVF = 0.898; 1 s Natural Breaks GVF = 0.901; 1 s EC EI NB

(Relatively uniform data distribution. "Natural Breaks" is best map.)

US 2005 % Obese, 18+, 4 Ranges

 Equal Counts GVF = 0.915; 1 s Equal Intervals GVF = 0.915; 1 s Natural Breaks GVF = 0.925; 1 s EC EI NB

(Very uniform data distribution. All three methods work very well.)

US 2005 % Current Smoker, 18+, 4 Ranges

 Equal Counts GVF = 0.817; 1 s Equal Intervals GVF = 0.873; 1 s Natural Breaks GVF = 0.883; 1 s EC EI NB

(Relatively uniform data distribution. "Equal Counts" has low GVF score.)

US 2005 % Satisfied with Life, 18+, 4 Ranges

 Equal Counts GVF = 0.898; 1 s Equal Intervals GVF = 0.901; 1 s Natural Breaks GVF = 0.914; 1 s EC EI NB

(Very uniform data distribution. All three methods work very well.)

US 2005 % Leisure Time Exercise, 18+, 4 Ranges

 Equal Counts GVF = 0.791; 1 s Equal Intervals GVF = 0.854; 1 s Natural Breaks GVF = 0.906; 1 s EC EI NB

(Extreme value in Puerto Rico. "Natural Breaks" works best.)

Iowa 2005 Cancer Cases, 4 Ranges

 Equal Counts GVF = 0.386; 1 s Equal Intervals GVF = 0.863; 1 s Natural Breaks GVF = 0.942; 1 s EC EI NB

(Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.)

Iowa 2005 Cancer Incidence Rate, 3 Ranges

 Equal Counts GVF = 0.737; 1 s Equal Intervals GVF = 0.787; 1 s Natural Breaks GVF = 0.813; 1 s EC EI NB

Iowa 2005 Cancer Incidence Rate, 4 Ranges

 Equal Counts GVF = 0.829; 1 s Equal Intervals GVF = 0.824; 1 s Natural Breaks GVF = 0.878; 1 s EC EI NB

Iowa 2005 Cancer Incidence Rate, 5 Ranges

 Equal Counts GVF = 0.876; 1 s Equal Intervals GVF = 0.909; 1 s Natural Breaks GVF = 0.921; 1 s EC EI NB

("Natural Breaks" is best map. But not big differences.)

Conclusions: Our overall conclusions, based on the maps:

(1) Use "Natural Breaks" with count data maps. Count data (eg, births, deaths) are typically greatly skewed (a few areas with very high counts). So "Equal Count" maps poorly display count data, and have extremely low GVF scores. For the same reason, "Equal Interval" maps with count data also have relatively low GVF scores. So "Natural Breaks" is almost always the best choice for mapping count data.

(2) Map appearance can differ greatly, EC vs IE vs NB. Most maps have a non-uniform data distribution. In those cases, the appearance will be quite different using the three different methods.

(3) Sometimes the appearance is the same, EC vs IE vs NB. If the data are pretty much linearly distributed, from low to high, the three methods will produce about the same results. CA births per 1,000 females, age 15-19 is a good example.

(4) Cell suppression helps greatly if low # of events. A rate based upon a small number of events is unstable, and will result in some extremely high rates (with high confidence interval). If suppressing a few areas eliminates extreme values, it usually increases the GVF score, and produces a map that is easier to interpret by removing "noise".

(5) Overall, "Natural Breaks" is better for rate maps. Both "Equal Counts" and "Equal Intervals" can easily do poorly when there is a non-linear distribution of values, with resulting low GVF scores and potentially misleading maps. A good example is "TX 2005 Age Adjusted Death Rate, Cancer".

(6) "Equal Counts" can be cosmetically appealing. "Equal Counts" uses each color equally. So it usually produces a more "colorful" map, perhaps for use in an atlas with more visual than epidemiological purpose. But more "colorful" does not translate to more "meaningful".

(7) Look at the underlying data. It is a good idea to make all three map types, and look at the actual data distribution, before publishing a map. Normally, it is best to let the numbers "speak for themselves". That is a good argument to normally use "Natural Breaks", which always produces the best GVF score.

Let us know if (1) you have thoughts or suggestions about natural breaks, the map examples, or other methods, or (2) you are interested in working together on further research on this topic, for publication in the peer-review literature.