One of the bedrock assumptions of science is that for a study's results to be valid, other researchers should be able to reproduce the study and reach the same conclusions. The ability to successfully reproduce a study and find the same results is, as much as anything, how we know that its findings are true, rather than a one-off result.
This seems obvious, but in practice, a lot more work goes into original studies designed to create interesting conclusions than into the rather less interesting work of reproducing studies that have already been done to see whether their results hold up.
That's why efforts like the Reproducibility Project, which attempted to retest findings from 100 studies in three top-tier psychology journals, are so important. As it turns out, findings from the majority of the studies the project attempted to redo could not be reproduced. The New York Times reports on the new study's findings:
Now, a painstaking yearslong effort to reproduce 100 studies published in three leading psychology journals has found that more than half of the findings did not hold up when retested. The analysis was done by research psychologists, many of whom volunteered their time to double-check what they considered important work. Their conclusions, reported Thursday in the journal Science, have confirmed the worst fears of scientists who have long worried that the field needed a strong correction.
This is a serious problem for psychology, and for social science more broadly. And it's one that, as the Times points out, observers in and around the field have been increasingly worried about for some time.
Why is psychology research (and, it seems likely, social science research generally) so stuffed with dubious results? Let me suggest three likely reasons:
A bias towards research that is not only new but interesting: An interesting, counterintuitive finding that appears to come from good, solid scientific investigation gets a researcher more media coverage, more attention, more fame both inside and outside of the field. A boring and obvious result, or no result, on the other hand, even if investigated honestly and rigorously, usually does little for a researcher's reputation. The career path for academic researchers, especially in social science, is paved with interesting but hard to replicate findings. (In a clever way, the Reproducibility Project gets around this issue by coming up with the really interesting result that lots of psychology studies have problems.)
An institutional bias against checking the work of others: This is the flipside of the first factor: Senior social science researchers often actively warn their younger colleagues—who are in many cases the best positioned to check older work—against investigating the work of established members of the field. As one psychology professor from the University of Southern California grouses to the Times, "There's no doubt replication is important, but it's often just an attack, a vigilante exercise."
This is almost exactly what happened in an incident earlier this year when a couple of grad students first started to find discrepencies in a major study about attitudes toward gay marriage. The study, which claimed to find that attitudes on gay marriage could be quickly made more positive by a 20 minute chat with someone who is gay, turned out to be built on fake data. The grad student who uncovered the fakes has said that, over the course of his investigation, he was frequently warned off from his work by advisers, who told him that it wasn't in his career interest to dig too deeply.
Small, unrepresentative sample sizes: In general, social science experiments tend to work with fairly small sample sizes—often just a few dozen people who are meant to stand in for everyone else. Researchers often have a hard time putting together truly representative samples, so they work with subjects they can access, which in a lot of cases means college students.
For example, this study on how physical distance influences emotional perception, which is one of the studies that the Reproducibility Project tried and failed to replicate, relied on three experiments, one using 73 undergraduates, another using 42 undergatuates, and another using 59 adults. These are obviously pretty far from the large, randomly controlled study groups that researchers would ideally use. That's not to say that studies using small groups of individuals aren't valuable at all, or that the researchers behind them aren't doing rigorous work. It's just harder to know if results generated from studies like this are widely generalizable.
By that same logic, however, we should be careful about over-interpreting the results from the Reproducibility Project's efforts. These were one-time replication attempts using, in many cases, the same small subject groups as the original studies. As Marcia McNutt, the editor of the journal Science, is quoted saying by the Times, "I caution that this study should not be regarded as the last word on reproducibility but rather a beginning."