Social science has a complicated, infinitely tricky replication crisis

"For scientists, getting research published in the journal Nature is a huge deal. It carries weight, prestige, and the promise of career advancement—as do the pages of its competitor, Science. Both have a reputation for publishing innovative, exciting, and high-quality work with a broad appeal. That reputation means that papers from these journals make up a substantial portion of day-to-day science news.

"But the prestige of these journals doesn’t exempt them from problems that have been plaguing science for decades. In fact, because they publish such exciting and innovative work, there's a risk that they're even more likely to publish thrilling but unreliable papers. They may also be contributing to a scientific record that shows only the "yes" answers to big questions but neglects to mention the important-but-boring "no" results.

"Colin Camerer, a behavioral economist at the California Institute of Technology, recently led a team of researchers in trying to repeat 21 social science studies from Science and Nature, successfully replicating 13 of them. The results, published yesterday (in Nature, naturally), may also hint at how our focus on positive results is biasing the literature. They also paint a complicated picture of the replication crisis in social science and illustrate how infinitely tricky the project of replication is.


"How reliable is the scientific record?

"Psychology’s reliability crisis erupted in 2011 with a wave of successive shocks: the publication of a paper purporting to show pre-cognition; a fraud scandal; and a recognition of p-hacking, where researchers exercised too much liberty in how they chose to analyze data to make almost any result look real. Scientist began to wonder whether the publication record was bloated with unreliable findings.

"The crisis is far from being limited to psychology; many of the problems plague fields from economics to biomedical research. But psychology has been a sustained and particularly loud voice in the conversation, with projects like the Center for Open Science aiming to understand the scope of the problem and trying to fix it.

"In 2015, the Center published its first batch of results from a huge psychology replication project. Out of 100 attempted replications, only around a third were successful. At the time, the replicators were cautious about their conclusions, pointing out that a failed replication could mean that the original result was an unreliable false positive—but it could also mean that there were unnoticed differences in the experiments or that the failed replication was a false negative.

"In fact, the bias toward publishing positive results makes false negatives a significant risk in replications.

"...Replicating a study sounds simple, but it isn’t

"Camerer and his colleagues wanted to test the reliability of social results published in Nature and Science. They went looking for studies published between 2010 and 2015 that would be easy to replicate: those that used research subjects that were easy to access (like undergraduate students) and tested a clear experimental hypothesis. They found 21 papers that fit their criteria.

"But Camerer and colleagues didn’t just want to look at each study on its own; they wanted to find out if they could say anything general about the reliability of this kind of work. They wanted to do science on the science, or meta-science. That meant that they needed to try to be consistent in how they did each replication. With wildly varying studies, that’s difficult, and it meant making some blanket decisions so that every paper got similar treatment.

"The team decided to focus just on the first experiment in each paper and try to replicate that. A single experiment can produce multiple results, so if the replication shows that some are the same and some are different, how do you decide whether it has been successful? The researchers decided to focus just on the result that the original study considered the most important and compare that with the replication.

"They involved the original authors in the replication of their work so they could be sure that the replications were as close to the original studies as possible and that everyone agreed on how they were going to analyze the data. They also made sure they had big enough sample sizes to find much smaller effects than those reported in the original papers, making it less likely that they’d get false negatives.

"...No simple lesson, just more homework

"All of these arguments highlight that meta-science is much like any other scientific discipline: really, really difficult. Like any other field, it demands that researchers use their experience to make choice after choice, when none of the options seem necessarily better or worse than the others. Even though there are no obvious right or wrong answers, it’s clear that other choices could have produced different results.

"Like other disciplines, meta-science has resource constraints. In a perfect world, it would have been great to replicate every one of the experiments in every one of the original papers. But just replicating the first study in each required substantial amounts of carefully coordinated work from dozens of scientists. Large panels of participants don’t pay for themselves, either.

"The small sample size of this paper, ironically, also limits its conclusions. It's only 21 papers; you can’t generalize from that to all of science, all of social science, or even all of the social science published in prestigious journals. The selection process also limits broad conclusions: these papers were carefully selected from only two journals, so they probably don't represent the literature as a whole.

"For the same reason, this doesn't tell us much about the reliability of high-profile, prestigious journals, either.

"More generally, a less than perfect replication rate doesn't mean we can't trust any results from social science or that science is irreparably broken. Even the unreplicated papers in this study might not actually be wrong—there are good reasons why a true result might fail to replicate, and researchers are just starting to figure out the sample size problem.

"Still, there are systemic problems that lead to publication of unreplicable results, and they're big, real, and thorny. Researchers who hold their cards close their chest, refusing to publish their materials and data, are one part of the problem. Poor training in statistics is another. And journals publishing only the sexiest, flashiest findings, while turning their noses up at sensible incremental work, help to skew the scientific record.

"There's reason for optimism, though: more and more researchers are adopting practices that are designed to reduce the risk of unreliable results."

Comments