Why Slate’s Article on Testing is Wrong

Cover of The new education illustrated

Research psychologists Christopher Chabris and David Hambrick recently wrote an article at Slate touting the benefits of using SAT scores for better college admissions, and critiquing colleges that have gone testless. Their argument attacks a strawman of anti-test positions, conflates empirical evidence with policy interventions, and completely ignores the very social, racial, and political questions that make standardized tests so contentious.  Here are four reasons to rethink their conclusions.

• • •

1. The SAT is redundant.

The crux of Chabris and Hambrick’s argument is that SAT scores predict college GPA—a true but misleading claim. In 2001, Cal announced they would stop using SAT I scores for admissions decisions, a policy shift driving by an internal study of their predictive utility (Geiser & Studley, 2001). The key finding behind Cal’s decision was that while SAT scores do predict college GPA on their own, they offer little added benefit when a student’s high school GPA is already known. Moreover, that small benefit disappears almost entirely when other factors, like parents’ education and income, are also known. In other words, SAT scores are redundant. College Board-touted numbers—presumably those most favorable to the SAT—show that high school GPA predicts 32% of the variance in college GPA, whereas a combination of GPA and SAT scores predicts 40% of the variance. While not insignificant, it’s definitely not a silver bullet for vastly improved admissions.

Chabris and Hambrick want you to think that people who don’t believe SATs predict college success are willfully ignorant. But many are referring to the redundancy of test scores: even if useful on their own, standardized tests add little to the “complex portrait” (high school performance, family background, etc.) of an applicant.


2. Admitting high-GPA students is not the only goal of college admissions.

The authors argue that SATs should be used in admissions because they predict college GPA—that admissions officers who ignore test scores are “disregarding the data” and engaging in “wishful thinking”; they’re head-in-sand testing anti-vaxxers. But this assumes that the only goal of admissions is to bring in high-GPA students. In fact, colleges weigh many factors, including the racial, economic, and academic diversity of the student body. In fact, a previous Cal admissions policy listed ten admissions principles, and only one was purely academic potential. Thomas Rochon is a former administrator turned college president, and in this op-ed, he points to work on stereotype threat to argue that SAT requirements depress applications from students of color, because they either perform worse or avoid the tests altogether out of concern for fulfilling negative stereotypes. (If “diversity” is too liberal, Charles Murray—author of The Bell Curve—calls for SATs to be abolished, as “totems” of wealth and privilege.)

There’s still debate about whether stereotype threat is a real effect, but Rochon found that when the test requirement was dropped, his school received more applications, the incoming class was more racially diverse, and the average freshman GPA didn’t change. A more diverse student body is both a goal and a common outcome of going test-optional, and it’s not just liberal fantasy: diversity itself may improve educational outcomes. If SATs are so crucial and valuable, as Chabris and Hambrick say, shouldn’t the hundreds of colleges that forego them have suffered consequences? And yet I’m not aware of any college that dropped SAT requirements and then went back to them, because there’s more to admissions than GPA.


3.) Predictably bad consequences.

Almost since their inception, the point of contention for standardized tests is persistent racial disparities in test scores. Because of those disparities and regardless of whether SATs predict college GPA or measure a general cognitive ability, a predictable consequence of (re)emphasizing test scores in admissions would be admitting more white (and upper-class) students. And because parents’ education and income are strongly correlated with educational achievement, those inequalities would necessarily grow generation by generation.

“Grit-based” educational interventions have been critiqued because they treat the symptom (educational shortfalls) rather than the disease (poverty and racism), and may only end up perpetuating racial and class divides rather than solving the problem (all while blaming poor kids for lacking mental toughness). That same logic applies to standardized testing, especially if we’re wrong about our ability to statistically “control for” racial and socioeconomic bias in the tests. The consequences matter: the consequences of inaction might compel us to act on climate change even if we’re uncertain, and the consequences of premature action should make us skeptical of using standardized tests for admissions decisions. On its own, that could be just an alarmist argument. But consider a reframing of the issue: at the core, test requirements trade a marginal improvement in admissions accuracy for growing racial disparities. Non-rhetorical question: is that worth it?


4.) Empirical evidence alone is not a good argument: context matters.

Chabris and Hambrick reduce the use of standardized test scores in admissions to a simple question: do SAT scores predict college success? Their narrow approach inevitably produces a narrow conclusion that ignores the historical and social contexts bound up in the use of standardized tests.

18th century German scientists wanted to maximize lumber yields, so they clear-cut the forests and planted the trees in evenly spaced rows. Yields increased, until the altered ecosystem collapsed and the forests died. DDT nearly eliminated mosquitoes and malaria, but also unexpectedly almost eliminated birds. Bypasses eased traffic congestion only until people stopped using them as intended.

I’m cherry picking, but the common thread is that all of these interventions were based on sound empirical evidence that ignored important contextual information. Forest scientists missed ecological context; DDT sprayers missed a chemical context (DDT accumulates rather than degrades); civic engineers missed a psychological one. It’s hardly a deep philosophical treatise, but even Jurassic Park warns of scientists focusing on the can of technology and not the should (see also this webcomic). It can be easy to focus on the trees (data) and miss the forest (broader contexts).

With testing, the context isn’t hidden or unknowable—it’s the entire reason that testing is contentious. It’s bizarre and disingenuous to act as though the only relevant question is what the testing data show. Test scores might predict college success, but testing also has a long history as a front for scientific racism: Galton’s eugenics, schools setting minimum ACT scores to eight points higher than the average of black applicants (a sort of educational poll tax), using SAT verbal score minimums to restrict enrollment of foreign students, and the policy suggestions of The Bell Curve. Questions about the causes, consequences, and implications of racial and socioeconomic differences in test performance are so long-standing and ubiquitous that they’ve yielded contentious laws, bestselling books, and multiple bitterly-contested Supreme Court cases stretching back decades.

Moreover, those concerns aren’t just consigned to the proverbial dustbin of history. For example, pro-testers argue that racial and socioeconomic disparities and test bias can be statistically “controlled for”, but it’s reasonable to question whether such deep and interrelated structural effects are so easily cleaved apart. Jonathan Sharkey’s work, for example, shows how race and economics are intertwined: black families making $100,000 a year live in neighborhoods similar (in terms of crime and poverty) to white families making $30,000. Rothstein (2003) applies different statistical corrections to SAT/GPA data and finds that the predictive benefit of SATs for college performance owes almost completely to socioeconomics rather than test performance—suggesting that vaunted statistical “controls” aren’t, as the saying goes, an “exact science”. More simply, racism and poverty have profound systemic effects: simply being poor negatively impacts cognitive abilities; stereotypes may lower test scores (alternatively, cash rewards can improve IQ test scores by up to 20 points); self-control and willpower might be limited resources.

We also have to consider how the test is used and received in reality. Chabris and Hambrick are right to clarify that a test score is not a measure of worth, but then it’s precisely because people see the test as important that the authors even need to stress this point. If tests weren’t treated so consequentially, stereotype threat probably wouldn’t be an issue, the test-prep industry wouldn’t be so lucrative, and people wouldn’t remember their SAT score for decades. Even test developers have been stuck between the rock of “ideal” and the hard place of “reality”: Alfred Binet wanted IQ tests to help teachers identify their students’ correctable deficits. Henry Chauncey helped develop the SAT and thought it measured a purely scholastic skill. They couldn’t control how the test was received and viewed any more than Chabris and Hambrick can. There’s no magic wand; knowing that people do see the tests as a value judgment, we have to ask whether the benefits outweigh that.

My larger point isn’t even, necessarily, that Chabris and Hambrick are wrong; it’s that the manner in which they reach their conclusion is so narrow and provincial that it’s almost glib. It’s neither virtuous nor possible to adhere to some kind of “valueless” “objectivity”, an illusion of scientific remove, and cleave off the narrow question of whether the test is predictive from the political, social, and academic considerations of its use. It’s especially mystifying to see them try to ignore all those considerations when the entire history of testing is bound up in those issues. Were their goal to discuss “what psychology can tell us about testing”, it might be fine. But instead they step from psychology data, and only psychology data, to a broad policy recommendation: not only can psychology answer all relevant questions about testing, but it already has—sociologists, educators, humanists, historians, and whomever else be damned.

• • •

In short, here’s why we should rethink the “more testing” conclusion: SATs may not tell us much that grades don’t, admissions is more than GPA, it will widen racial and class divides, and because psychological data can’t speak to the important social and racial considerations bound up in the history of testing.

I still think standardized tests could be used well, though. For example, one of the original goals of SAT designers was to allow underprivileged students to demonstrate unrecognized abilities. This is laudable, and part of why many schools go test-optional rather than test-trashbin is because high scores from kids with poor grades, bad teachers, or difficult backgrounds can be hugely beneficial to them. Cal uses a sliding scale so that high achievement test scores require lower GPAs for admissions eligibility, an approach that is better than blanket testing because it targets the potential benefits of tests to kids at the bottom, without further privileging those at the top.

*(This was cross-posted at Medium and revised from an earlier version)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s