Finding “Lost” Research

On “ghost papers” and our knowledge surplus

If an academic paper is never read, does it make a sound? A recent review finds that some 90% of published academic papers are never cited in future papers, as many as 50% of articles are never read by anybody but reviewers, and the average paper is read by just three people. These ratios probably also apply to internet content. That review is about using stats like that to measure the reach and prestige of both journals (impact factor) and researchers (h-factor). Of course it’s impossible to know exactly how many people read a paper, but the magnitude is shocking: a lot of papers aren’t even reaching academic audiences. They are “ghost papers,” abandoned and floating the high seas, never heard from again. What are the consequences of ghost papers for the future of research?

• • •

I’m fond of this comic that imagines the whole of human knowledge as a big circle. Researchers build expertise, acquire background knowledge, and study in their chosen field trying to reach the edge of that circle. Once there, their novel research makes a tiny bump, growing the circle and pushing human knowledge forward, if only a tiny bit.

Following that analogy, every increase in human knowledge—every bump on the circle—means it takes more background knowledge, more time, more effort for the next person to get to that edge and make their contribution. Human knowledge is growing at an accelerating rate: the number of total published articles in most disciplines now doubles every 15-20 years. As a practical matter, a grad student looking to reach the same relative expertise as their advisor had coming out of graduate school would need to read anywhere from two to five times as many articles as their advisor did, and do so in the same amount of time while being expected to publish even more. It’s the intellectual equivalent of drinking from a fire hose.

Since that’s not practical, one strategy to solve the problem is to narrow the area of focus to more manageable slices of the circle. A psychologist a hundred years ago might have studied “memory,” a psychologist fifty years ago might have restricted themselves to “short-term memory,” and a psychologist now might study visual short-term memory only. The title of a study from fifty years ago carried a lot of information even for a novice: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information (Miller, 1956) was a seminal work on the limits of short-term memory. Then look at a recent title and realize just how much more contextual knowledge is needed to make sense of it: Negative Emotional Experiences during Navigation Enhance Parahippocampal Activity during Recall of Place Information.

And even if someone does read all those papers, they still have to remember the information, synthesize it, understand how pieces connect, and access it later. The brain isn’t an infinite warehouse and time is limited, and meanwhile that fire hose of information keeps spraying. We have to figure out how to overcome limitations of brain and time to get that information and reach the edge of the circle. That so many papers are going unread—even if some are just poor quality—suggests we aren’t doing a good job of it. Mountains of useful information, as well as effort, are simply being lost in the ether.

We’re overproducing knowledge, even as that circle gets bigger and expertise gets narrower. What can we do to keep up?

• • •

I suspect academic research will change to manage information oversaturation. It might have to; it’s hard to see those statistics and not think we’re approaching a breaking point between our ability to produce knowledge and our ability to manage it (individually, anyways—I’m sure “big data” is happy to have all those numbers). I think some of those changes will be cultural: publication rate might slow as tenure boards emphasize publication quality rather than quantity (this is already happening, albeit slowly). “Open science” might help otherwise “lost” papers find an audience, by being more accessible. Shorter papers, like those mandated by prestigious journals like Science or Nature, may become more common.  Video or other formats could change how “articles” are disseminated (the Journal of Visualized Experiments publishes text articles with accompanying videos, but the videos are intended for the general public, rather than the research community).

We’ll probably also see changes in how individual scientists organize, track, catalog, archive, and absorb information. The Open Science Framework is already offering an open-source and open-ended management system as a place for scientists to hold and share data, experiment files, stimuli, and analysis protocols. “Personal” databases may become more common, with simple ways to tag, organize, and relate different studies that are individualized for the researcher’s interests. One cool possibility is the use of machine-learning networks trained to automatically collect (and perhaps even organize and summarize) relevant publications on a topic. That’s already happening in a limited sense: the brainSCANr finds related terms in huge bodies of neuroscience papers, presumably allowing us to identify previously unrecognized links between topics/concepts.

Probably we will not end up with the dreadful virtual-reality filing system from Disclosure, but no matter the form of the change, we’ll see—and need—new ways to control that flood of information and accommodate the human (and brain) constraints on reaching the edge of the circle.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s