The Hughes Diversity Index

Ed Hughes was a recent school board member, and president, here in Madison.  He is a lawyer, and a terrific champion of public schools.  In a recent blog, Hughes describes what he calls the “Diversity Dividend”, the benefits derived by students attending schools with high racial diversity: academic, social, and civic, as well as providing better preparation for work life.

Regrettably, diversity often ends up penalizing school districts.  Parents evaluating the potential of a given school or community examine test scores as a convenient and easy measure of academic strength.  But almost invariably, students of color score lower, on average, than white students.  Consequently, the published overall average score for a given district is usually lowered as the student body becomes more diverse.

This tendency can be quite diabolical.  As example, Hughes points to the ACT scores for two local high schools, one in Madison and one in the adjacent town of Middleton.  Madison’s score for every racial group was higher than Middleton.  But because of the racial composition, the overall average for Middleton was higher; (Middleton had a higher percentage of whites, so even though they scored lower than Madison’s whites, the larger percentage helped raise their overall score above Madison).  There’s a way to report the numbers that addresses this incongruity; we’ll look at that next time.  For now, suffice it to say, you have to dig a little deeper than just one simple reported figure if you want to get a more accurate read on what the numbers have to say.

Being a blog that focuses on numbers, I wanted to describe the technique Hughes employs to measure the level of diversity.  As a starting point, he references what is called the Herfindahl-Hirschman Index, or “HHI”.  It’s an approach used in antitrust law to measure the concentration of products or services, i.e., to measure the relative lack of competitiveness.  It’s calculated by taking the market share of each participant, squaring the percentages, and adding them up.  So if a product market has four companies with shares of 50 (percent), 30, 15, and 5, the HHI would be: (50 x 50) + (30 x 30) + (15 x 15) + (5 x 5) = 2,500 + 900 + 225 + 25 = 3,650.  Another market where four firms each have a 25 share would have an HHI of 2,500 (= 625 + 625 + 625 + 625).  In its extreme form, a market with only one firm possessing a 100 share would have an HHI of 10,000 (= 100 x 100).  Clearly, the lower the score, the better: one wants more competitiveness, not less.  Similarly, to obtain the “diversity dividend”, we want to see higher diversity, not lower.  But how to measure it?

Hughes takes the HHI formula, turns it on its head, and comes up with a simple and brilliant technique for measuring diversity.  First, Hughes takes the highest possible score of 10,000, and subtracts from it the sum of the square of the racial share percentages.  Here in Wisconsin, racial mix is usually reported across five categories:  Asian, Black, Hispanic, White, & all other.  (Actually, racial mix here also includes American Indian, Pacific Isle, and Two or More Races; I’ve combined these typically very small categories into “all other”, to ensure share totals add up to 100.)  For Wisconsin, with a racial mix of 4, 9, 11, 72, & 4, that equates to:  10,000 – (16+81+121+5,184+16) = 10,000 – 5,518 = 4,482.

Next, he scales up the score so that perfect diversity adds up to 10,000.  With five categories, a perfect distribution would have all five with a 20 share, which when subtracted from 10,000 leaves 8,000.  Scaling up that score requires multiplying it by 1.25.  Stated more broadly, the adjustment entails multiplying by n/(n-1), where n is the number of categories you have.  (If you have five categories, the factor is 5/4 or 1.25; 3 categories would be 3/2 or 1.5; etc.)  The scaled up Wisconsin score is: 4,482 x 1.25 = 5,602.  To make the score less cumbersome, Hughes’ third and final step is to divide the result by 100.  So the Wisconsin score ends up simply as 56.  (Actually, the racial mixes here have been rounded; not rounded, the final score for Wisconsin comes out at 57.)

And that’s it.  I think what I’ll call here the “Hughes Diversity Index” is brilliant.  It’s easy to calculate, and so easy to comprehend and use for comparative purposes.  It can obviously be applied to any setting where you want to capture the relative mix, be it public schools, universities, corporations, communities, countries, whatever.  By the way, for the US population overall, the index is 69, while for the US student population – using averages from 4th grade and 8th grade NAEP test takers – the score is about 81.  The higher student score reflects expectations of ever-growing diversity in this country.  The Madison school district’s score was 90, the highest of all of Wisconsin’s 424 school districts.  How does your community/school district / company fare?

The Retention Rate Formula (Part 3)

Over the last two weeks, we’ve been looking at the formula used to calculate retention.  Unless an organization is able to separately track new and existing customers, the problem with the retention rate formula becomes one of determining how to treat new customers.  We found that the most commonly used formula effectively treats new customers as though they all arrive on the last day of the year.

YearEnd Formula:            Retention Rate (RR)  =  (End – New) / Start

Another formula some firms use effectively treats new customers as though they all arrive at the very beginning of the year.

NewYear Formula:          RR  =  End / (Start + New)

So what does a more accurate and “reasonable” formula look like?  Well, if you think about it, for most businesses, new customers will be coming in throughout the year.  Some will sign up on the equivalent of January 1, some will arrive December 31, and all the others will arrive all the various days in between.  On average, they’ll arrive around midyear.  So, to measure retention properly, we want a formula that treats new customers as though they arrive midyear.

A “MidYear” formula is quite easy to construct, if we just re-state our traditional formulas a little differently.  The “NewYear” formula can be revised to show that there are zero new customers being subtracted from the ending customers in the numerator, while the denominator re-states the new customers as “1 New”, meaning that it picks up all 100% of the new customers.

NewYear Formula Restated:      RR  =  (End – 0 New) / (Start + 1 New)

Similarly, the “YearEnd” formula can be restated to show all new customers subtracted from the ending in the numerator, while there are zero new customers added to the denominator.

YearEnd Formula Restated:        RR  =  (End – 1 New) / (Start + 0 New)

Well, a “midyear formula” should be the average of these two formulas, right?  In the numerator, we’ll subtract the average of 0 and 1, or ½, of the new customers.  And in the denominator, we’ll add the average of 0 and 1, or ½, of the new customers to the beginning customers.

MidYear Formula:             RR  =  (End – ½ New) / (Start + ½  New)

And voila, by treating new customers as arriving midyear, on average, we have a retention rate formula that remarkably well reflects the “true” retention rate.  Our example was for all of 2017, but it can be applied anytime during the year.  Though new customers will seldom come in on average at exactly midyear, they will probably usually come in very close to that.

We can now try out our new formula on the simple example we used previously.

MidYear Formula:            RR  =  (End – ½ New) / (Start + ½  New)

Example:                             RR = (100 – ½ x 20) / (100 + ½ x 20)

RR = (100 – 10) / (100 + 10)

RR = 90 / 110 = 81.8%

We subtract half the new customers, or 10, from the numerator, and add half of the new customers, or 10, to the denominator.  We arrive at a retention rate of 81.8%.  As one might expect, the result is roughly halfway between the 80% we got using the “YearEnd” formula, and the 83.3% obtained using the “NewYear” formula.  It’s a little more involved, but if you want to use the “right” formula for calculating the retention rate where the customer counts include new customers, the MidYear formula is the right one to use.

The Retention Rate Formula (Part 2)

Last week, we looked at the typical formula used for measuring the retention rate.  That formula is easy to use.  But it’s not that accurate.  The most commonly used retention rate formula is:

Formula #1:        Retention Rate (RR)  =  (End – New) / Start

Some firms use a different formula.  Instead of subtracting the new customers from the ending account total, they include new customers with the starting base.

Formula #2:        RR  =  End / (Start + New)

Again, we’ll demonstrate the formula under the same scenario previously used: we start and end the year with 100 customers, and add 20 new customers during the intervening 12 months.

Example:             RR = 100 / (100 + 20)    =    100 / 120    =    83.3%

You end with 100 customers; you started with 100 customers, and added 20 along the way.  Of the 120 total customers, you’ve “retained” 100 of them, for a retention rate of 100/120, or 83.3% – using Formula #2.  One of the attributes of this approach, that management would appreciate, is that you end up with a higher figure.  This formula, like #1, is also easy to calculate.  But which of these, if either, is more appropriate?

The central challenge with any retention rate formula is how to treat new customers, or more precisely, how to treat when they are new.  Let’s walk through a couple of examples, wearing an economist’s hat.

To help clarify a given problem, economists love to make assumptions.  So let’s start by assuming that it’s the end of 2017, and we’re looking back and measuring the retention rate for 2017.  Let’s further assume that all the new customers came in on the 2nd day of the year, January 2.

If all the new customers came in on the 2nd day of the year, then essentially, by the end of the year they have effectively had an entire year in which to leave, while the existing customers had the full year to leave.  In this instance, it is quite reasonable and appropriate to treat the new customers as though they were existing; after all, they have had virtually the same amount of time in which to leave as the existing customers did.  So the formula for measuring retention for 2017, under this scenario, is to take our ending customers and divide them by the starting plus new.  Let’s call this the “NewYear” formula, since all the new customers arrive at the start of the new year.

NewYear Formula:          RR  =  End / (Start + New)

Next, let’s run a different scenario where we assume that all the new customers arrive on the 2nd to last day of the year, December 30.  Are we going to want to include them in our measurement of retention?  Absolutely not; they haven’t had any time to leave yet.  So here we’ll want to subtract our new customers from the count of ending customers, while our baseline will be our starting customers.  Let’s call this the “YearEnd” formula, since all the new customers arrive at year end.

YearEnd Formula:            RR  =  (End – New) / Start

Of course, our “YearEnd” and “NewYear” formulas are identical to the “Formula #1” and “Formula #2” that are commonly used.   The primary formula, “Formula #1”, is identical to the YearEnd formula; it treats all new customers as though they came in at the very end of the year.  Our second formula treats all new customers as though they come in at the start of the year.  Clearly, neither of these formulas rest on a reasonable assumption.

Next week we’ll determine what the retention rate formula “should” be.

The Retention Rate Formula (Part 1)

The retention rate is one of the most important metrics for any organization that has customers paying on some kind of regular or subscription basis.  From wireless telecom service to insurance coverage to banking products and more, the ability to retain customers over time is critical for success.  The typical measure of retention is one that calculates the percentage of existing customers that remain over a one year period.  You start the year with 100 customers; one year later, 80 of them are still on the books: the retention rate is 80%.

That’s simple enough.  The tricky part with any retention rate formula however, is how to deal with new customers who come in during the year.  Ideally you separate them out, and indeed some companies do separately track retention on existing customers versus new.  But there are some definite challenges with that approach: the minutiae of tracking all the different types and timings of attrition can often lead to numbers that “don’t add up”; there is the complexity of providing two different measures of retention, leading to the inevitable request for a third measure that combines existing & new customers; and finally, there is the question of whether the time and cost are worth the effort.

Be that as it may, let’s presume your measure of retention is for new & existing customers combined; what formula do you use to measure retention?  Recently, I did a Google search on measuring retention, and by far the most common formula I came across was the following:

Formula #1:        Retention Rate (RR) = (End – New) / Start

Very simply, one subtracts new customers from the count of ending customers, and divides that amount by the starting customer count.  In the equation, the starting point is 1 year prior to the end point, and new customers are the count of sales during those intervening 12 months.

To demonstrate this retention formula, as well as two other formulas to follow, let’s use a simple example where we start and end the year with 100 customers, and during the year bring in 20 new customers.

Formula #1:        RR = (End – New) / Start

Example:             RR = (100 – 20) / 100

RR = 80 /100 = 80%

In this example, we start the year with 100 customers, see 20 new customers come in during the year, and end with 100 customers.  If we subtract out the new customers, we see that we’ve retained 80 customers, giving us a retention rate of 80 over 100, or 80%.  Nice.  Simple.  But unfortunately, not very accurate.  Next week, we’ll see why.