hasselhoff, beanbag tossing, and testing

Any list of the most perplexing things about American culture has to include standardized testing, that Knight Rider was both made and remade, and David Hasselhoff, just in general.


Standardized test scores determine school funding, teacher jobs, student progress, the mental wherewithal of NFL draft picks, the trustworthiness of underpaid entry-level job applicants, and whether convicted murderers are mentally capable enough to be murdered by the state (that’s what Flowers for Algernon was about, right?). They’re more ubiquitous than even the Hoff—maybe because once you have a hammer, everything looks like a nail. Though most discussion of America’s yen for testing focuses on the historical and political components of it (e.g., Adam Curtis’s The Trap), I’m going to focus on the cognitive aspect, in part because of a recently published study on what standardized tests really test.

• • •

A big reason that academic standardized tests carry so much weight is that they are good predictors of future academic and job performance. If you do well on the test, you’re likely to do well in school or at work. And the underlying assumption is that the tests are predictive because they are measuring some underlying cognitive skill that is relevant to work or school. When schools or federal laws aim to increase test scores, in theory they are really working to improve that cognitive ability, which would lead to better test scores and better outcomes for the kids taking the tests.

While some schools have developed programs that improve test performance, a recent MIT study raises questions about the real benefits of those programs. Much like the supernatural neo-noir Baywatch Nights, things aren’t always as they seem: the take-home message from that study was that (at one school), following an intervention program, test scores increased but students didn’t improve at other skills or abilities. If kids weren’t getting better at unrelated skills like lifeguarding or blacksmithing, this would be expected. But in fact they were stagnant on the exact kinds of cognitive skills that we expect to be improved, like memory capacity or processing speed. In short, they got better at the test, but seemingly not anything else, including what the test ostensibly measures. So what are these test-score-centric educational interventions accomplishing?


For an answer, let’s talk about beanbag tossing. In one of my favorite psychology experiments, a group of eight-year-olds practiced tossing beanbags at a target (Kerr & Booth, 1978). Half the kids practiced nothing but two- and four-foot tosses, while the other half practiced only three-foot tosses. After they practiced for awhile, all of the kids were tested with three-foot shots. The result is one of those findings that seems simultaneously both predictable and unexpected: the first group—who had never even taken a three-foot shot before—were more accurate at three-foot-shots than those who’d practiced nothing but three-foot shots (this is vital information for dedicated cornhole players).

There are different ways of knowing things. The kids who practiced tossing at various distances were learning an abstract, flexible, and generalizable approach to tossing a beanbag at a target. In contrast, the single-distance kids were learning a specific, contextual rule for tossing a beanbag in just one particular situation.


Now take that distinction between “abstract” and “specific” knowledge and go back to standardized tests. When kids are drilled by rote example in test-prep and “teaching to the test,” they’re likely learning test-specific knowledge. They’re becoming adept test-takers, full of hints and tricks and rules for answering questions even if they don’t “understand” them; they are like rule-applying automatons. Like the three-foot-only beanbag tossers, they’ve learned specific and inflexible skills, not generalizable and transferable knowledge.*

*This is similar to philosopher John Searle’s “Chinese Room” thought experiment. He talked about how a person given a large enough set of rules could respond to questions in a language they didn’t “understand” at all. An outside observer who sees nothing but the appropriate responses thinks they “get it,” because their responses are indistinguishable from a native speaker. Students who learn test-taking tricks may give a similar outward appearance of deeper understanding by applying specific rules.

This explanation suggests that with familiarity and instruction, test scores can become decoupled from the cognitive ability the test aims to measure. If educational interventions teach test-specific, inflexible knowledge—intentionally or not—then the tests will come to measure only rote memory and recall rather than a general cognitive ability. And that’s the rub: the benefit of the test—its predictive utility—almost certainly depends on the test measuring a general cognitive ability. What this study suggests is that increasing test-prep, when it means learning test-specific rules, will make the tests less and less useful over time. In fact, it’s entirely possible that single-minded focus on teaching to the test may cause tests to fall out of favor, as they increasingly become less useful as predictive tools, which represents a sort of Schrodinger’s-cat paradox: if we gave up on the tests and stopped prepping, they’d become useful again, at which point we’d probably start up with test prep and be back at square one.

• • •

That metrics like standardized test scores can be deceptive isn’t reinventing the wheel. In The Trap, Curtis talks about how the Reagan/Thatcher/neoliberal obsession with metrics and quotas and numbers often fails because people end up fixing the numbers and not the “problem” the numbers are supposed to measure. Hospital wait times may once have been a good way to measure how quickly patients were treated, but they lost meaning when hospitals began using greeters to meet, but not treat, patients—thus reducing wait times with no tangible benefit. In most cases, people intentionally game the system, like the people employing the greeter to improve their scores or the politician who touts those “reduced wait times” to get elected. These interventions actually do improve test scores, which means that test-prep may be inadvertently gaming the system, raising scores but nothing else.

If that’s how test-prep is going wrong, then the beanbag study—and the difference between abstract and specific knowledge—suggests a different approach to testing and instruction. The important part about the beanbags was that the three-foot-only tossers got better with practice, but the multi-distance tossers improved even more. “Teaching to the test” raises scores and makes absolute sense for teachers facing low budgets and high stakes. But gearing interventions towards general, abstract problem-solving skills, rather than test-specific ones, would likely lead to even greater test-score gains than current approaches. And almost certainly, that new knowledge would transfer to other tasks.

Except that’s the smug and oversimplified approach to a monolithic issue one gets from political ads, Freakonomics, and “very special episodes” of Family Ties. It’s not clear how to teach and improve abstract reasoning skills; in fact it’s not even clear how to define them. Ask a brain scientist and they’ll spin semantic circles while laboring to avoid a variation on Potter Stewart’s “I know it when I see it” pornography edict. You’re apt to get a more precise answer from David Hasselhoff. If the question is “what’s the cognitive analog of 2- and 4-foot beanbag tosses,” I’m not sure anyone knows. It’s like the philosopher’s stone of learning.


Brain training games promise it, but don’t yet deliver. Some of those games require problem-solving at first, but they aren’t “endless”—eventually you figure out the pattern, “crack the code,” and then you’re simply doing rote repetition again (maybe a solution is to leverage the sheer number of brain training companies out there, and use a new game each day). And like with beanbag tossing, improving game-specific skills won’t translate to general benefits. Evidence of improvement on the practiced “game” is both abundant and meaningless, unless solving Sudoku is actually your job. In contrast, evidence that brain training improves abilities outside the practiced game is almost nonexistent. For right now, most brain training is cognitive snake oil: specific improvements masquerading as general gains.

Maybe the best current alternative is chess. In contrast to puzzle books and training games, chess offers essentially endlessly novel problem-solving. The problem is never static and rarely the same. Ben Franklin was onto its uniqueness centuries ago, and it’s different enough from any other game or task that cognitive scientists have written entire books about it, and continue to study how it influences memory and problem-solving abilities  Chess can buffer, at least for a time, against the cognitive declines of aging (so too does multi-lingualism, which is not surprising, since language is also novel and generative). Some evidence suggests that chess is an effective educational intervention for improving math skills. Maybe everyone should just be playing chess instead of learning the rubric for an achievement test.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s