Making Apples of Student Test Scores

Last time we examined how school districts & communities are penalized for having a well-diversified student body.  The penalty occurs as students of color generally score much lower on standardized tests.  And those tests are often a key ingredient in measuring the “quality” of a given district, or school.

We noted the rather extreme example of a comparison of two local high schools, one here in Madison Wisconsin and one in the neighboring town of Middleton.  ACT scores for the Madison school were higher (than Middleton) for whites, higher for blacks, higher for Hispanics, higher for Asians, higher for all other races.  Yet Middleton’s overall score was higher.  How could that be?  Because Middleton has a higher percentage of students that are white, and because whites score higher, their overall average is higher than Madison.

Think about that.  You’re moving to the Madison area; your choice of location leans heavily on test scores (perhaps not the best measure, but it is nice to have an easy number to use for comparison); you look at these two schools… and you opt for Middleton as it has a higher score.  But if your child is white, wouldn’t you want to go to the Madison school since it scores higher?  If he or she were black, wouldn’t you opt for Madison?  Or if they were Hispanic, or Asian, or…?  Our world, and certainly here in America, is becoming increasingly diverse.  By choosing Madison, your child gets greater exposure to that diversity and will hopefully be better prepared for the working world, and social and cultural world, as a result.  But that overall test score may easily steer you away.  How can we fix it to reflect more accurately how these two schools compare?  Quite easily.

To demonstrate how to make the appropriate comparison of results, I will refer now to recent (2015) NAEP results for 8th grade math.  The NAEP (National Assessment of Educational Progress) is a nationwide exam that represents the best available comparative tool for measuring student performance across the U.S.  On the left hand side of the table below, the 8th grade math scores are shown.  For each racial type, Texas scores higher than Wisconsin.  Yet, Wisconsin’s overall score is higher.  When we look at the racial distribution at the right, we see the primary driver: Wisconsin has a much higher percentage of white students than Texas.  And as whites generally score higher, the overall score for Wisconsin is higher.

8th Grade Math NAEP Scores
Wisconsin Texas WI Dist TX Dist U.S. Dist
White 296 298 76% 31% 52%
Black 249 267 9% 11% 15%
Hispanic 271 277 10% 52% 24%
Asian 295 312 3% 4% 6%
All Other 283 290 2% 2% 3%
Original Overall 289 284 100% 100% 100%

The simple fix for the distortion is to treat both states as though they each have the same racial distribution.  The most appropriate distribution to apply is that for the U.S. overall.  Essentially, the weighted average score for each and every state is calculated using each state’s actual scores by race, and applying the U.S. average race distribution as weights.  Thus, the Wisconsin white score of 296 has a weight applied to it of only 52%, not 76%.  Meanwhile, the Texas 298 score for whites will also be multiplied by 52%, instead of by just 31%.

By applying the nationwide racial distribution to both states, the overall results change quite dramatically.  The Wisconsin overall score drops to 283, while Texas jumps to 288.  This result seems very reasonable, given how well each race in Texas performs compared to Wisconsin.  Not surprisingly, the state nationwide rankings change accordingly.  Wisconsin’s original ranking of 6th falls to 19th, while Texas soars from 22nd to 3rd.



Adjusted Overall 283



One final point to address: sometimes a state’s count for a given race may be so small that they are unable to report a statistically valid average.  For example, in Wisconsin that was the case for the category “2 or more races”, which represented 1.3% of the 8th grade math test takers.  When only one category is missing, one can apply algebra to “back into” the approximate score.  But if more than one category is missing, or if results are already rounded, a different approach may be more appropriate.  Although inevitably inaccurate to some slight degree, you may do just as well to take the ratio of the state’s overall score to the U.S. overall score, and apply it to the US average score for the given category.  Thus, for Wisconsin’s “2 or more races”:

Estimate  =    (WI Overall Score / U.S. Overall score) x U.S. Overall score for category

=    (289 / 282)  x  285

=    292