hasselhoff, beanbag tossing, and testing

Any list of the most perplexing things about American culture has to include that Knight Rider was both made and remade, David Hasselhoff in general, and standardized testing. Standardized tests are ubiquitous: scores determine school funding, teacher jobs, student progress, the mental wherewithal of NFL draft picks, the trustworthiness of underpaid entry-level job applicants, and whether convicted murderers are mentally capable enough to be murdered by the state. But what do standardized tests really test?


The basis of standardized testing is prediction: test scores are good predictors of future academic and job performance. Presumably, tests are good predictors because they measure some underlying cognitive ability that is relevant to work or school: you do well on the test and at school because you are good at skill X, and both tests and school require skill X. And because test scores correlate with future outcomes, schools (and governments) have spent time and money developing programs meant to improve test scores, on the assumption that raising test scores should improve academic and job performance.

Schools have successfully implemented programs that improve test performance, but a recent MIT study should make us question our reliance on standardized tests. In that study, they found that, after an intervention program at one school, standardized test scores went up, but students didn’t improve at other skills or abilities. This wasn’t just unrelated skills like blacksmithing or shoe-tying: students did not see improvements on exactly the kind of cognitive skills we expect standardized tests to measure, like memory capacity or processing speed. If test scores go up but no other skills improve, what exactly are these interventions accomplishing, and what does it mean for our assumptions about standardized testing?

The answer to this question requires a side trip to beanbag tossing. In one of my favorite psychology experiments, a group of eight-year-olds practiced tossing beanbags at a target (Kerr & Booth, 1978). Half the kids practiced nothing but two- and four-foot tosses, while the other half practiced only three-foot tosses. After practicing, all kids were tested with only three-foot shots. The results are somehow both predictable and unexpected: the first group—who had never even taken a three-foot shot before the final test—were more accurate than the group who’d practiced nothing but three-foot shots before the final test.

The beanbag tossing study is an important analogy for different ways of knowing things. Kids who practiced tossing at multiple distances were learning an abstract, flexible, generalizable approach to tossing beanbags. The single-distance kids were learning a specific, context-dependent rule. The same distinction may apply to standardized tests: students drilled by rote example in test-prep could be learning mostly test-specific knowledge, becoming adept test-takers fulls of hints and tricks; rule-applying automatons. Like the three-foot-only beanbag tossers, they’ve learned specific and inflexible skills, not generalizable and transferable knowledge.*

*This is similar to philosopher John Searle’s “Chinese Room” thought experiment. He talked about how a person given a large enough set of rules could respond to questions in a language they didn’t “understand” at all. An outside observer who sees nothing but the appropriate responses thinks they “get it,” because their responses are indistinguishable from a native speaker. Students who learn test-taking tricks may give a similar outward appearance of deeper understanding by applying specific rules.

If interventions teach inflexible test-specific knowledge rather than broad and transferable knowledge, then tests will begin to measure memorization rather than a general cognitive ability. Test scores, in other words, may become decoupled from the very cognitive ability the test is meant to measure.  But the supposed benefits of testing—its predictive utility—almost certainly depend on measuring cognitive ability! This implies that intervention programs could simultaneously raise test scores while making the tests less useful as predictive tools. If test scores go up but nothing else does, tests will mean less—just like how hospital wait times don’t tell you much once a hospital employs a greeter to meet, but not treat, incoming patients (see Curtis’s The Trap).

•     •     •

If that’s how test-prep goes wrong, the beanbag study, and the distinction between abstract and specific knowledge, suggests a new approach. In the beanbag study, three-foot-tossers did get better, but the multi-distance throwers improved even more. “Teaching to the test”-style interventions are necessary for teachers with low budgets and high stakes, but gearing interventions towards general, abstract problem-solving skills rather than test-specific ones would lead to greater, and more transferable, gains.

Unfortunately, that a smug and hand-waving answer. It’s not clear how to teach and improve abstract reasoning skills; in fact it’s not even clear how to define them. Ask a brain scientist and they’ll spin semantic circles; like Potter Stewart and pornography, they might know it when they see it, but a definition is a different beast. What is the cognitive analog to practicing both 2- and 4-foot beanbag toss? I don’t know. No one knows. It’s the philosopher’s stone of learning.


Brain training games promise it, but don’t deliver. Like current test interventions, brain training games will improve performance, but also like current test interventions, these skills don’t transfer outside the game being played. Evidence of improvement on the practiced “game” is both abundant and meaningless, unless solving Sudoku is actually your job. In contrast, there is zero evidence that brain training improves abilities outside the practiced game. For right now, brain training is cognitive snake oil: specific improvements masquerading as general gains.

Maybe the best alternative is chess. In contrast to puzzle books and training games, chess offers essentially endlessly novel problem-solving, an environment where you can’t eventually learn a set of rules to win the game or solve the problem. Ben Franklin spoke of how chess was unique centuries ago; cognitive scientists have written entire books about it and continue to study how it influences memory and problem-solving abilities. Chess can buffer, at least for a time, against the cognitive declines of aging (so too does multi-lingualism, which is not surprising, since language is also novel and generative). Some evidence suggests that chess is an effective educational intervention for improving math skills. Maybe everyone should just be playing chess instead of learning the rubric for an achievement test.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s