cock of the walk

drunkards_walk_coverTwo take-home messages from The Drunkard’s Walk: randomness rules everything around me, and humans are bad at intuiting about probability and statistics. Start with the second one first.

In Bayesian analysis, one uses known information (the priors) to estimate the likelihood of other events (the posteriors). Bayesian math is straightforward, but it’s also unintuitive. For one thing, it’s easy to mix up the prior and the posterior. Consider someone who thinks their spouse is having an affair when they “stay late at the office,” reasoning that people who have affairs frequently use that excuse. But that’s backwards, because they’re estimating the likelihood of staying late at the office, assuming an affair. But the “known” information is that they’re staying late, so the correct formulation is to estimate the probability of an affair, knowing that they’re staying late. Those two probabilities may be very different.

To further illustrate the unintuitiveness of estimating probabilities, consider another example. I’ve developed a nearly foolproof test to detect hipsters: I fire some gamma rays at a person and read the radiation signature (it’s the same technology NASA uses to identify dead pulsars in deep space; it actually determines whether they are infected with Hipsterichia coli bacteria). The prototype version of my device self-destructed when I tried to take a reading on some knob riding a pennyfarthing…

pennyfarthing

But the new version is stable. Assume that 1 in 1000 people are hipsters (the “base rate”), and my test comes back positive 99% of the time when the person is a hipster and has a false positive rate of 1%. Tragically, my nearly-foolproof test reveals that you are, in fact, a hipster. “I’m sorry,” I say, mustering as much sympathy as I can—“the test doesn’t lie.” You break out in a cold sweat, heart racing. “But I hate hipsters…how can this have happened?” I nod sympathetically. “Yes, that’s often the first sign…”

Knowing you tested positive, what’s the actual likelihood you’re a hipster? In a hypothetical population of 100,000 people, we’d expect 100 to be afflicted, and if we tested the entire population, then 99 of those 100 should be correctly identified as hipsters. With a false positive rate of 1%, then 999 (1% of the remaining 99,900) will test positive. So, that’s a problem: 1098 positive tests, and only 99 (~9%) are from actual hipsters. If all we know is that you tested positive, you still only have a 9% chance of actually being a hipster (if the base rate of hipsterism were 1/100 instead of 1/1000, a positive result would be accurate 50% of the time).

Back to the drawing board for me. Maybe if I use even more gamma rays…

hulk_bear

Earlier, I mentioned that part of humans’ problem with Bayesian estimation is that it’s easy to mix up what you know and what you’re estimating. The base rate problem can be similar: we know the test is positive and we want to know the likelihood of infection (i.e., 9%). But the converse formulation is just so enticing: what’s the likelihood of a positive test given an infection (i.e., 99%)? In a way, it’s like we can be probabilistically dyslexic.

I’m not sure knowledge and training help that much. I’ve seen variants on the “base rate” problem a dozen times, but I don’t find myself making better estimates so much as I just account for bad intuitions and scale my estimate back. And even people who should know these things sometimes miss out: Mlodinow discusses receiving an HIV diagnosis from his doctor, and being told the test was 99% accurate (even though the false positive rate was huge). Famously, studies of the base rate problem show that even doctors and statisticians often screw it up.

With low base rates, false positives can outnumber correct positives by orders of magnitude, even for tests that seem sensitive. It’s a major reason we don’t, and shouldn’t, blanket test for rare diseases, even if early detection is important (it’s also why airport security measures are largely misguided—what’s the base rate on terrorism?). But this also clarifies why “risk factors” are so important, because they add to known information: a positive result means a lot more when it’s combined with risk factors (beard, flannel shirt, silly glasses…).

• • •

Despite their contributions to mathematics, the ancient Greeks never had a theory of probability. They gambled, though, tossing a sort of proto-dice called astrogali (made from heel bones). One argument for why they never developed theories of probability, despite their other mathematical chops, is that they believed in divine will—probability being of little interest if all events are in control of the divine. I’m curious whether the dice mattered. Astrogali were strangely shaped and not equally likely to land on each side; perhaps those unequal weightings made random effects harder to see.

Either way, it took more than a millennium of dice rolling before the first true study of probability was published in the 1600s (about the same time as calculus was discovered, that seems weird, right?). Not surprisingly, gambling spurred that innovation: the first study came from trying to determine how to split the pot on an interrupted game. Despite not knowing the laws of probability and statistics, both casinos and insurance companies existed and were presumably profitable for decades beforehand, which is pretty impressive. Have you seen a craps table?

craps_table

One reason it may have taken so long to come up with rigorous rules of probability is that the human brain doesn’t seem to like randomness. If pigeons are given food at random times, they will develop elaborate behavioral routines, seeming to “believe” that whatever action they performed immediately before they were fed actually caused the food to arrive. So they repeat it, and when it doesn’t work they continue to add more behavioral tics until their actions are dominated by stereotyped, repetitious behavior. (The converse can also true: animals randomly given punishments exhibit learned helplessness, in which they stop doing anything at all, presumably as an attempt to avoid punishment).

In other words: we assume things happen for a reason. This stock price went up because of the CEO’s great idea, this advertising campaign worked because of subliminal sex messages, that baseball player got a hit because they are “clutch”. It’s convenient to presume causes after the fact (it’s also profitable, the “punditry” is built on causal attributions in the absence of evidence), but probably far more events than we think result from unpredictable, random processes.

For example, normal accident theory is based on the idea that in complex systems, hundreds, thousands, or hundreds of thousands of small, essentially random factors contribute to the operation of the whole. Occasionally, just by random chance, those small factors can align in such a way as to cause an unpredictable outcome, whether it is an accident, error, or success. Probably there are a few horror writers with talents comparable to Stephen King, but maybe just the right editor saw King’s manuscript, maybe just the right reviewer read it, or maybe the propmaster on Mork & Mindy put Carrie on the coffee table for an important scene. Stephen King is talented and hard-working and prolific, but like anyone who has found success, he was also assuredly the beneficiary of a few lucky dice rolls. Maybe an equally talented writer had a better version of Carrie in the works before coming down with malaria, and but for that their careers may have been switched. The thing is: it just sucks to think of these things being random. It might be nice for misfortune to be random, but who wants their success to rest even partly on luck?

Take sports. Our intuition might be that a “hot” basketball shooter is more likely to make their next shot and “cold” shooter is more likely to miss. This is the gambler’s fallacy, and it’s just not true: like flipping a coin, a player’s next shot is independent of their last shot, now matter how much it “feels” like a coin that came up heads 10 times in a row is due for a tails. How is it that we can know things are random, but have trouble believing it?

Other notes:

• I’m intrigued about the psychological aspect of a basketball player being “in the zone” when they are shooting well, or baseball players “seeing the ball well” when they’re hitting well. If each shot or at-bat is independent, why do they feel like their previous performance matters? I think there are two possibilities:

1. The feeling of being “in the zone” is entirely based on outcome. We feel “in the zone” because we made a few shots or “off” because we missed a few in a row.

2. Shooting a basketball is the complex interaction of dozens of muscles, motor pathways, and perceptual pathways. We might imagine that the brain, when performing a well-practiced act, is capable of picking up on even minute deviations from the “ideal” action pattern. If so, then feeling “in the zone” could happen when you repeatedly get feedback from your muscles suggesting the action was performed precisely, even if the outcome was not perfect.

I suspect it’s really a combination of the two, but I would like to test whether an athlete can predict the outcome of an action (e.g., shooting a basketball) either a) before taking the shot or b) after taking the shot but in the absence of feedback about its outcome. In other words, what happens if someone shoots and the exact moment the ball leaves their hands the lights go out? Could they tell you whether they made the shot?

• The French mathematician Henri Poincare suspected he was being cheated by his baker, so he weighed several weeks worth of bread loaves and discovered they averaged some 50 grams less than their advertised 1000 gram weight. He called the police, who excoriated the baker. Poincare kept weighing the bread, though, and now found he was getting a surfeit of heavy loaves. This was unlikely to be random; presumably the baker had continued on his merry way selling undersized bread, but made sure to give Poincare the largest loaf he had. Poincare once again called the cops, who once again paid a visit to the offending patisserie owner. Similar deviations from randomness are still used for fraud detection today, and work in part because people are really bad at both identifying randomness and generating randomness. Wonder what happened to that bread guy though.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s