# Jenks Natural Breaks vs Alternative Methods

Purpose: To help Vitalnet users better understand the differences between different methods for setting map ranges, this page compares "Natural Breaks" (NB) and two alternative range algorithms: "Equal Counts" (quantiles) (EC) and "Equal Intervals" (EI).

Method and Results: A series of commonly made maps was used for the comparisons, using three to five ranges. While somewhat arbitrary, these are typical maps that would be made, and serve as a useful way to compare the three methods. The maps use the "Diverging Green-Yellow-Red" color palette. The "Sequential Grey" palette produced similar findings. GVF (goodness of variance fit), the distribution of the underlying data, and visual inspection were used to assess the maps. A higher GVF is better, indicates that the method did a good job of classifying the counties into ranges. The statistics (GVF and other) for each map may be viewed by viewing the HTML source. Of course, there is nothing "magical" about GVF scores. Possibly other variance measures could be used. But GVF is a reasonable variance indicator.

Map Comparisons: Each example below has our comments (in parentheses). At the very end are some conclusions, our thoughts about natural breaks.

TX 2005 Deaths, 4 Ranges

Equal Counts
GVF = 0.192; 1 s

Equal Intervals
GVF = 0.880; 1 s

Natural Breaks
GVF = 0.956; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Count data. "Equal Counts" has very low GVF. "Natural Breaks" is best map.

TX 2005 Age Adjusted Death Rate, 3 Ranges

Equal Counts
GVF = 0.578; 1 s

Equal Intervals
GVF = 0.636; 1 s

Natural Breaks
GVF = 0.655; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: All three methods work about the same.

TX 2005 Age Adjusted Death Rate, 4 Ranges

Equal Counts
GVF = 0.637; 1 s

Equal Intervals
GVF = 0.650; 1 s

Natural Breaks
GVF = 0.827; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

TX 2005 Age Adjusted Death Rate, 5 Ranges

Equal Counts
GVF = 0.678; 1 s

Equal Intervals
GVF = 0.729; 1 s

Natural Breaks
GVF = 0.899; 8 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: "Natural Breaks" best by far. Extreme rate in Kenedy County distorts map.

TX 2005 Age Adjusted Death Rate, 3 Ranges, Suppress if ≤ 10 Deaths

Equal Counts
GVF = 0.749; 1 s

Equal Intervals
GVF = 0.663; 1 s

Natural Breaks
GVF = 0.766; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

TX 2005 Age Adjusted Death Rate, 4 Ranges, Suppress if ≤ 10 Deaths

Equal Counts
GVF = 0.811; 1 s

Equal Intervals
GVF = 0.814; 1 s

Natural Breaks
GVF = 0.858; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

TX 2005 Age Adjusted Death Rate, 5 Ranges, Suppress if ≤ 10 Deaths

Equal Counts
GVF = 0.851; 1 s

Equal Intervals
GVF = 0.884; 1 s

Natural Breaks
GVF = 0.904; 7 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: "Natural Breaks" still best. Suppressing Kenedy County improves map.

TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges

Equal Counts
GVF = 0.373; 1 s

Equal Intervals
GVF = 0.697; 1 s

Natural Breaks
GVF = 0.826; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

TX 2005 Age Adjusted Death Rate, Cancer, 4 Ranges, Suppress if ≤ 5 Deaths

Equal Counts
GVF = 0.600; 1 s

Equal Intervals
GVF = 0.654; 1 s

Natural Breaks
GVF = 0.840; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: "Natural Breaks" best by far, with or without suppression.

TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges

Equal Counts
GVF = 0.776; 1 s

Equal Intervals
GVF = 0.749; 1 s

Natural Breaks
GVF = 0.863; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

TX 2005 Age Adjusted Death Rate, Diabetes, 4 Ranges, Suppress if ≤ 5 Deaths

Equal Counts
GVF = 0.786; 1 s

Equal Intervals
GVF = 0.862; 1 s

Natural Breaks
GVF = 0.891; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: "Natural Breaks" best, with or without suppression.

CA 2000 Births, 4 Ranges

Equal Counts
GVF = 0.331; 1 s

Equal Intervals
GVF = 0.900; 1 s

Natural Breaks
GVF = 0.974; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

CA 2000 Births, Age 15-19, 4 Ranges

Equal Counts
GVF = 0.329; 1 s

Equal Intervals
GVF = 0.811; 1 s

Natural Breaks
GVF = 0.985; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.

CA 2000 Births per 1,000 Females, Age 15-19, 3 Ranges

Equal Counts
GVF = 0.858; 1 s

Equal Intervals
GVF = 0.897; 1 s

Natural Breaks
GVF = 0.897; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

CA 2000 Births per 1,000 Females, Age 15-19, 4 Ranges

Equal Counts
GVF = 0.916; 1 s

Equal Intervals
GVF = 0.921; 1 s

Natural Breaks
GVF = 0.937; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

CA 2000 Births per 1,000 Females, Age 15-19, 5 Ranges

Equal Counts
GVF = 0.952; 1 s

Equal Intervals
GVF = 0.961; 1 s

Natural Breaks
GVF = 0.963; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Very uniform data distribution. All three methods work very well.

CA 2000 Cesarean Rate, 3 Ranges

Equal Counts
GVF = 0.705; 1 s

Equal Intervals
GVF = 0.702; 1 s

Natural Breaks
GVF = 0.770; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

CA 2000 Cesarean Rate, 4 Ranges

Equal Counts
GVF = 0.792; 1 s

Equal Intervals
GVF = 0.805; 1 s

Natural Breaks
GVF = 0.851; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

CA 2000 Cesarean Rate, 5 Ranges

Equal Counts
GVF = 0.834; 1 s

Equal Intervals
GVF = 0.898; 1 s

Natural Breaks
GVF = 0.901; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Relatively uniform data distribution. "Natural Breaks" is best map.

US 2005 % Obese, 18+, 4 Ranges

Equal Counts
GVF = 0.915; 1 s

Equal Intervals
GVF = 0.915; 1 s

Natural Breaks
GVF = 0.925; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Very uniform data distribution. All three methods work very well.

US 2005 % Current Smoker, 18+, 4 Ranges

Equal Counts
GVF = 0.817; 1 s

Equal Intervals
GVF = 0.873; 1 s

Natural Breaks
GVF = 0.883; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Relatively uniform data distribution. "Equal Counts" has low GVF score.

US 2005 % Satisfied with Life, 18+, 4 Ranges

Equal Counts
GVF = 0.898; 1 s

Equal Intervals
GVF = 0.901; 1 s

Natural Breaks
GVF = 0.914; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Very uniform data distribution. All three methods work very well.

US 2005 % Leisure Time Exercise, 18+, 4 Ranges

Equal Counts
GVF = 0.791; 1 s

Equal Intervals
GVF = 0.854; 1 s

Natural Breaks
GVF = 0.906; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Extreme value in Puerto Rico. "Natural Breaks" works best.

Iowa 2005 Cancer Cases, 4 Ranges

Equal Counts
GVF = 0.386; 1 s

Equal Intervals
GVF = 0.863; 1 s

Natural Breaks
GVF = 0.942; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: Count data. "Equal Counts" very low GVF. "Natural Breaks" is best map.

Iowa 2005 Cancer Incidence Rate, 3 Ranges

Equal Counts
GVF = 0.737; 1 s

Equal Intervals
GVF = 0.787; 1 s

Natural Breaks
GVF = 0.813; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

Iowa 2005 Cancer Incidence Rate, 4 Ranges

Equal Counts
GVF = 0.829; 1 s

Equal Intervals
GVF = 0.824; 1 s

Natural Breaks
GVF = 0.878; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks

Iowa 2005 Cancer Incidence Rate, 5 Ranges

Equal Counts
GVF = 0.876; 1 s

Equal Intervals
GVF = 0.909; 1 s

Natural Breaks
GVF = 0.921; 1 s

Method GVF (higher is better)
Equal Counts
Equal Intervals
Natural Breaks
Comment: "Natural Breaks" is best map. But not big differences.

Conclusions: Our overall conclusions, based on the maps:

(1) Use "Natural Breaks" with count data maps. Count data (eg, births, deaths) are typically greatly skewed (a few areas with very high counts). So "Equal Count" maps poorly display count data, and have extremely low GVF scores. For the same reason, "Equal Interval" maps with count data also have relatively low GVF scores. So "Natural Breaks" is almost always the best choice for mapping count data.

(2) Map appearance can differ greatly, EC vs IE vs NB. Most maps have a non-uniform data distribution. In those cases, the appearance will be quite different using the three different methods.

(3) Sometimes the appearance is the same, EC vs IE vs NB. If the data are pretty much linearly distributed, from low to high, the three methods will produce about the same results. "CA 2000 Births per 1,000 Females, Age 15-19" (Map Series #14) is a good example.

(4) Cell suppression helps greatly if low # of events. A rate based upon a small number of events is unstable, and will result in some extremely high rates (with high confidence interval). If suppressing a few areas eliminates extreme values, it usually increases the GVF score, and produces a map that is easier to interpret by removing "noise".

(5) Overall, "Natural Breaks" is better for rate maps. Both "Equal Counts" and "Equal Intervals" can easily do poorly when there is a non-linear distribution of values, with resulting low GVF scores and potentially misleading maps. A good example is "TX 2005 Age Adjusted Death Rate, Cancer" (Map Series #8).

(6) "Equal Counts" can be cosmetically appealing. "Equal Counts" uses each color equally. So it usually produces a more "colorful" map, perhaps for use in an atlas with more visual than epidemiological purpose. But more "colorful" does not translate to more "meaningful".

(7) Look at the underlying data. It is a good idea to make all three map types, and look at the actual data distribution, before publishing a map. Normally, it is best to let the numbers "speak for themselves". That is a good argument to normally use "Natural Breaks", which always produces the best GVF score.

Let us know if (1) you have thoughts or suggestions about natural breaks, the map examples, or other methods, or (2) you are interested in working together on further research on this topic, for publication in the peer-review literature.