Science

Cancer Research Reproducibility Study: Science Still Broken?

The results of only two out of five cancer studies could be replicated

|

ScientistEurekaYanlevDreamstime
Yanlev/Dreamstime

Experimental replication is a hallmark of the scientific method. The idea is that scientific findings can be considered accurate if any researcher using the same procedures gets the same results. However, that is not what is happening today, as I reported in my article, "Broken Science." For example, in a 2012 study in Nature researchers replicated the findings of only six out of 53 (11 percent) landmark published preclinical cancer studies. In 2011, researchers at Bayer Healthcare reported that they could not replicate 43 of the 67 published preclinical studies that the company had been relying on to develop cancer and cardiovascular treatments and diagnostics.

To address this problem, the Center for Open Science launched the Reproducibilty Project to assess the replicability of research in various disciplines. The first effort focused on trying to reproduce the results of 100 psychology studies. As reported in Science in 2015, just 39 of the studies were successfully replicated. The Open Science Center turned its attention to cancer biology research. The first results of attempts to replicate the findings in five different cancer studies are out, and they are complicated. The researchers were able to essentially reproduce the findings in two studies. The results in two others were not interpretable due to technical problems, and one failed replication. Interestingly, other labs report being able to replicate the results in the study whose results could not be reproduced by researchers working with the Open Science Center.

In trying to reproduce the findings of another one of the studies that reported identifying mutations that boosted the proliferation of melanoma cells, the replicating researchers actually found that the control mice without the mutated cancer cells died faster than the ones with the mutated melanoma. They speculate that the different results might hinge on changes in cell culture or further unidentified mutations in the cancer cells used in the replication experiment. And, of course, it may be that the replication studies are flawed rather than that the original ones are. Only more studies would enable researchers to figure out which is which.

"Science, the pride of modernity, our one source of objective knowledge, is in deep trouble," writes Daniel Sarewitz, a professor at Arizona State University's School for Future Innovation and Society, in his 2016 essay "Saving Science" in The New Atlantis. The pervasive lack of robust reproducibility is confirming Sarewitz' conclusion.

Advertisement

NEXT: On Entitlements, Elizabeth Warren Says "the Metric is Money"

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

  1. “There were two other guys in bed with us last night! Brenda, you whore!”

    1. “I am not a whore! I’m a slut! There’s a difference!”

      1. Do you guys remember this book?

        1. Whores are capitalists.

        2. Wouldn’t & wouldn’t.

  2. Cancer Research Reproducibility Study: Science Still Broken?

    Who cares if it works, as long as it’s getting enough money.

    1. I find this whole concept quite disturbing, though in my case, GBM is rare enough that nobody is talking much about reproducibility.

      1. I didn’t know, Pro’L Dib. Perhaps I should have taken up oncology after all. I wrote my medical school application essay specifically concerning the Major Histocompatibility Complex (MHC) WRT to how carcinogenic cells “talk to” and “recognise” each other, and are aren’t “seen” by the Immune System and manage to slip through, particularly WRT the Nervous and Skeletal Systems, as cancers of these are the most notorious to TX.

        Yes, Dr. Groovy wanted to be an oncologist, originally. Perhaps if I had stayed that course, I could have helped you more.

        Godspeed.

        1. I’m doing well–no recurrence so far, and I’m responding very well to treatment. I have some motor and spacial deficits on my right side, but I’m still functioning and still working. And in therapy for the physical stuff.

          If it does recur, I’ll probably go for one of the immunological trials, which are mostly geared at getting the immune system to attack the tumor.

          17 months from dx, I think.

          1. I’m doing well–no recurrence so far, and I’m responding very well to treatment. I have some motor and spacial deficits on my right side, but I’m still functioning and still working. And in therapy for the physical stuff.

            You’re an atty; how could anyone possibly tell the difference? -*DUCKS* every atty’s face punch’-

            I’m very happy to know this, Pro’L Dib, and in my thoughts, prayers, and bestest wishes, you have.

            “O you who know what we suffer here, do not forget us in your prayers.”

            -from the Manual of Muad’Dib by the Princess Irulan

            1. I did have awake surgery, which is akin to the gom jabbar.

              Thanks for the well-wishes. I’m pretty optimistic, despite the statistics.

            2. I did have awake surgery, which is akin to the gom jabbar.

              Thanks for the well-wishes. I’m pretty optimistic, despite the statistics.

              1. Joking about the gom jabbar part–I thought that was kind of cool, and I didn’t feel much.

  3. “See this? These urine levels are precisely why I don’t visit the water park.”

    1. Or the back of a certain pet store.

    2. “I have discovered the cure for vagina AIDS, not gay AIDS!”

    3. “chemjeff looks exactly like you would expect”

  4. So when a study was repeated, the results were not supported. And the failure to replicate was published. I dunno, sounds to me like that’s a good example of the system working as intended. But what do I know, I’m merely a scientist.

    1. I agree. I don’t see how this is a problem. How are the people who did the initial study supposed to know their results were some kind of a statistical quirk and can’t be reproduced? All they have is the results they obtained. Isn’t the entire reason for publishing your results to find out if other people can reproduce them and thus confirm their validity?

      It seems to me the only problem here is that people are not trying to reproduce work often enough not that the results don’t always confirm the original work.

      1. A big problem is that peer review almost never involves reproducing the work and almost all checking on studies involves peer review

        1. But I don’t see how peer review could ever involve reproducing the work. The work takes months or years to do. Reproducing the work would make it impossible to get anything published. That is not what peer review does. Peer review looks at the work and sees if there are any methodological flaws to it. It shouldn’t be expected to reproduce the work. That is other people’s job to do after the work is published.

        2. That’s not a “problem with peer review.” It’s outside the scope of what peer review is supposed to be.

          I will admit to trying to replicate the work in a paper I was reviewing exactly once. I didn’t have to, I could have just sent it back with a note saying, “This is bullshit,” but I succumbed to curiosity at how they could have gotten results that seemed so wrong.

          1. Another issue with reproducibility is that there may be situations where, say, a cancer trial really does work for some people but not all. And the people conducting the study, as well as those studying the study, might not know about the really working part. What then?

              1. There’s a trial going on at Duke using modified polio viruses to treat GBM. Despite the lack of any cure, the FDA made them spend 10 years proving that the virus designed not to cause polio didn’t cause polio. Ten years.

                1. If that ends up being a cure (it’s in human trials now), maybe -thousands of humans. Bless the government and its efficiency.

                  1. If you question part of it, you question all of it.

          2. Just curious, what sort of scientist are you?

            Also, do you like lasers?

            1. Seriously, has anyone ever claimed not to love lasers? If they have, they’re lying. . .or they’re a laser victim.

            2. If that was meant for me, I’ve been all over the map. Usually skirting the line between physics and chemistry, most of my career in macromolecules and polymers, but my best-known paper was in molecular biology and toxicology.

              1. I bet we work in similar fields.

                My MO is molecular dynamics for proteins and DNA. Interesting to know there’s another physical chemist here.

                1. My MO is molecular dynamics

                  I see what you did there.

              2. Yes it was meant for you. Very impressive. I sometimes wish I had stayed in research after grad school, but I also knew that it wasn’t for me and that I would not be happy in the long run.

            3. Climatologists hate carbon dioxide lasers just because.

      2. I agree. I don’t see how this is a problem.

        Except it’s a community in-and-of-itself.

        Science is replete with researchers holding themselves to standards greater than their peers and achieving dissimilar results. It also has it’s share of dumb luck discoveries. Neither one of those is an issue until you become incapable of resolving the two. Which is why we have applied sciences, markets, etc.

      3. How are the people who did the initial study supposed to know their results were some kind of a statistical quirk and can’t be reproduced?

        The claim is that null results are not generally published and therefore there is an incentive to tune experiments until you get a positive, publishable result. If you increase the number of trials without telling anyone the significance of a result can seem higher than it is (that’s a sort of handwavy statement but you get the idea). To put it differently, there is a lot of cherry picking that people don’t see.

        I can’t comment directly on how much this is a trouble in other fields. I can say that in my field non-detections aren’t usually published but I *think* there is less opportunity for cherry picking because we aren’t generally setting up an experiment and running it until we get an answer. It’s observational, so you either detect something statistically significant and then try to interpret the physics behind it, or you don’t. And I think we usually understand our errors and noise sources pretty well. At least I hope so.

      4. The odds of statistical quirks appearing as “significant” results should be extremely small, if they did things right. “What we should do” gets replaced by “what we can do” in the name of needing progress for career survival.

        In fields like medicine and psychology an academic researcher can go their entire career without ever getting a result to really work, they move on to the next hot area before the previous one ever reaches the applied stage. In fact to get funding you need to. It would be like having a career in the private sector based purely getting investor funding for startups that never make a profit. Once the investor funding is spent you “publish” what the company learned and wash your hands of it, and start the next one. Obviously the private sector demands more and this is also the solution to our little problem.

    2. For any given study, the process is solid whether the subsequent study confirms or contradicts the initial results.

      When well over half of published studies can’t be confirmed by subsequent studies, then there is a systemic problem of publishing too early or publishing stuff that is too weak and shouldn’t be published at all. When billions of dollars of research are involved, this is a major problem.

      1. The problem with research funding is that it is harder to tell what proposals have merit and what don’t than people think. Also, we throw huge amounts of money at basic scientific research and especially cancer research. It never occurs to anyone that there isn’t an endless supply of promising research ideas and at some point you run out of promising ideas to fund and end up funding junk.

        More importantly, since it is often hard to tell what is promising and what isn’t and we have a lot of money, it makes sense to fund a lot of studies that turn out to be bunk. You fund the less promising ones on the hope that a few of them or even one of them turn out to actually be good ideas.

        So I really am not surprised that most studies turn out to not be reproducible and not profitable avenues of research. Most things turn out to be dead ends. If it was easy to figure out what what research was going to lead to a solution, we would have long solved the problem.

        1. “If you are not failing, you are not trying”

          This is a fact of life in research and development.

          If companies are marketing products to people based on studies that can’t be reproduced, I am OK with buyer beware.

          If corporations are investing IRAD dollars based upon studies that can’t be reproduced, that’s the stock holder’s problem.

          If governments are setting public policy based upon studies that cant’ be reproduced, we are all getting fucked. I am not OK with this.

          1. If corporations are investing IRAD dollars based upon studies that can’t be reproduced, that’s the stock holder’s problem

            You have to do the study to find out if it can be replicated. And that requires money.

        2. Also, with cancer, there’s the desperation factor. If the cancer is quickly spreading, like glioblastoma, or usually detected too late, like pancreatic cancer, there’s a lot of research that’s throwing spaghetti against the wall.

        3. It never occurs to anyone that there isn’t an endless supply of promising research ideas and at some point you run out of promising ideas to fund and end up funding junk.

          Something that crossed my mind with this article is that I wonder if the issue is one of “the low hanging fruit has already been picked”. The easy stuff has been done; maybe we’re just left with the stuff that is simply hard to do correctly consistently.

        4. You fund the less promising ones on the hope that a few of them or even one of them turn out to actually be good ideas.

          IMO that is the whole point of basic research. You don’t know which discoveries will be truly revolutionary and which will be mundane, even if all your experiments are perfect. You have to do it all knowing that your ROI will be near zero on lots of projects but could be enormous on some small number.

          So I really am not surprised that most studies turn out to not be reproducible and not profitable avenues of research. Most things turn out to be dead ends.

          Again, I think the bigger issue here is that many of these studies seemed like they were very well done and lots of people assumed the results were correct without really checking. What I’ve always found interesting is that the research that built on these difficult to reproduce results never saw an indication that the foundations were suspect. I’m not sure what to make of that.

    3. I dunno, sounds to me like that’s a good example of the system working as intended.

      Would be nice if that was all their was to it. The problem is the public is lead to believe that peer review is a of stamp of truth and scientist are far to silent about the shortcomings of peer review. Just read any press release on published peer reviewed paper.

      Other problems are pal review where the scientist in a small field all know each other and it is fairly easy to figure out who reviewed your paper. Wouldn’t want the guy who’s paper you rejected to retaliate when you submit your next paper.

      Got to mention publish or perish. Not a good incentive for the advancement of science. A fairly good incentive to publish rubbish.

      Then there is Mann’s hockey stick. Dubious statistics shrouded in a word salad. I suspect a lot of scientist read it, didn’t understand it, and were unwilling to admit they weren’t smart enough to comprehend it and question its validity. It took Steve McIntyre, a proficient statistician, to prove Mann’s algorithm generated a hockey stick with trend less red noise.

      Peer review is a rather recent phenomena. Einstein published over 300 papers, only one was peer reviewed. Claude Shannon’s 55 page “A Mathematical Theory of Communication” was not peer reviewed. In fact it most likely would be rejected today due to length.

      I see little evidence that peer review has done anything to improve the quality of scientific publications.

      1. Peer review is far too much like government to be anything close to actually providing the value for which it claims. Basically it’s just gatekeepers and bureaucrats. It’s one thing to ask trusted colleagues for an independent review, it’s an entirely different thing when it becomes a government regulation. In the first case the incentive is intellectual honesty, in the latter the incentive is cash flow.

  5. “Science, the pride of modernity, our one source of objective knowledge, is in deep trouble”

    I have been saying that for 20 years. Unfortunately I didn’t know just how right I was. I thought it was primarily the politically sensitive areas instead of the all encompassing rot that it is.

    Get government money out of science. That is the only cure.

    1. From Eisenhower’s Farewell (Military Industrial Complex) speech:

      Today, the solitary inventor, tinkering in his shop, has been overshadowed by task forces of scientists in laboratories and testing fields. In the same fashion, the free university, historically the fountainhead of free ideas and scientific discovery, has experienced a revolution in the conduct of research. Partly because of the huge costs involved, a government contract becomes virtually a substitute for intellectual curiosity.

    2. The challenge is that basic research almost has to be funded philanthropically. Most research will not result in extremely valuable discoveries and even when it does, profiting from that is difficult because basic research (as it is currently practiced) requires the free exchange of ideas without patents or IP.

      If the research in question is relatively low cost then you can find philanthropists who can fund it. But a lot (not all, but a lot) of the inexpensive questions have been answered. Take LIGO, which last year detected gravitational waves. It cost over $1 billion. I don’t see how that can be funded privately. Maybe government funding allowed the budget to balloon, but even if it could have been done for $500 million, that’s a lot. Only 8 grants by the Gates Foundation have met or exceeded that, for example. The Large Hadron Collider costs $1 billion per year to operate.

      Maybe this is a sign that we should be doing less basic research. But it’s not clear to me how you fund current levels of basic research privately.

      1. Get more than one donor?

  6. Dang…a sciency nut punch.

  7. I wonder how many landmark AGW studies would be reproducible.

    1. They don’t need reproducibility, because they’ve got consensus.

      1. Yeah, BP, what’s wrong with you?

        1. Too much to go into. Here, anyway.

          But ‘consensus’ really says a lot about modern science, doesn’t it? Experiments don’t matter, it’s all about who will succeed in the vote.

          Kind of a shame there’s no electoral college in academia.

          1. I was kidding, of course. You’re totally right: 0.00.

          2. Consensus: 80% of your peers reviewed your work and approved. What they approved and why is immaterial.

    2. They don’t really run experiments. It’s just interpretation of data. So it’s about as sciency as history.

      1. They don’t really run experiments. It’s just interpretation of data.

        They call their climate model runs an experiment and the output from the climate model is their data.

        1. A simulation is not an experiment. An experiment requires mapping from the natural world to a model. A simulation is a mapping within a model.

  8. “Eureka! I’ve finally found the dishwashing liquid that stained my gloves!”

    1. Yeah, you were soaking in it.

      1. Madge, I thought you was dead.

        1. Madge is eternal life personified. According to Ronal’d Bejlij, until your cells (DNA) and digital presence are no longer, and therefore, no longer reproducible, are we truly dead?

          1. Theseus has a ship for you.

  9. RE: Cancer Research Reproducibility Study: Science Still Broken?
    The results of only two out of five cancer studies could be replicated

    Not to worry here folks.
    I’m sure Trump the Grump will launch a new Cancer Research bureaucracy that will find a cure for cancer in the next ten or twelve centuries with a minimum budget of $50 trillion a year.
    A small price indeed for finding the cure.

    1. I think Uncle Joe Biden’s already on it.

      1. He even got a Medal of Freedom for curing cancer just last week.

  10. Cancer Research Reproducibility Study: Science Still Broken?

    Mrs. Casual quoted a headline last night about a study that showed how caffeine helps prevent cancer.

    I don’t know about anyone else, but I’m personally getting tired of people discovering that all the stuff that I consume copious amounts of every day helps prevent cancer, heart disease, dementia, Alzheimers, etc.

    1. Ditto. As always, toxicity is directly proportional to the base dose required of a given substance v. the inverse of dose dependency for toxic levels specific to the individual.

      Fancy way of saying, “Lung cancer from smoking got my father at 50, but Gammy smoked Pall Malls till 106 and never even needed O2!” Or, “Gotta watch my sugar, or the Da ‘Beetus ‘ill get me!” v. “I ate nothing but refined sugars for b-fast, lunch, and dinner, and I can’t seem to gain a pound!”

  11. Can anyone summarize that Daniel Sarewitz article that Bailey linked? It’s long as fuck and I got tired of all the DOD fellating. I have a bad feeling though, knowing Bailey, that the conclusion will be that we need “the right science policy with the right top men.”

  12. “Science, the pride of modernity, our one source of objective knowledge, is in deep trouble,” – an exaggeration that is itself part of a PR campaign. Taxpayer-fund-chasing is the problem, and plenty of scientists know it. Some take the stance that, if chattering class non-scientists believe the BS, then so what? Let NPR listens believe whatever they want, the matronly dupes. And some scientist know it but play the game anyway. That’s the case with climate types. The level of cynicism and consensus worship is sickening.

    But the “deep trouble” isn’t with science or scientists – it’s with the crowding-out effect of spending scarce resources on producing crap. And when last did you hear of a profit-minded business tolerating for long the irreproducibility described above? Science is not broken. The funding model is broken.

  13. Not mentioned enough: it’s not clear we have a “reproducing scientific results” crisis so much as a “no one interprets statistical tests correctly” crisis.

    If we’re estimating a true effect B, we do so by a study that gives us b-hat, or our measure of B. Even if we’ve measured b-hat really well, because of sampling error we can’t assume we’d get the same b-hat if we replicated the study.

    According to frequentist theory, the only way to get B from b-hat is to replicate the study a huge number of times, and estimate a lot of b-hats. As the number of replications approaches infinity, the central tendency of the b-hat distribution should converge to B.

    But no one has the time or the money to do all those replications, so we make assumptions. First, we don’t know the form of the b-hat distribution, but let’s say we do, and that it’s an easy distribution to work with mathematically, like a normal curve. Second, we don’t know B, but what if it were something really interesting, like zero? So, when we calculate a p-value, we’re basically estimating the probability of observing our data if our assumptions are all true.

    The obvious implication: it does NOT follow that p less than 0.05 means that B is different from zero. A “significant result” implies either that B is different from zero, that one or more of our assumptions is wrong, or that we’ve just observed a b-hat from the tails of the b-hat distribution. And there’s no way to figure out which from one or two study results.

  14. Then you have the Milgram studies which academia spends millions of dollars annually trying desperately to not replicate.

  15. Speaking as a scientist that not only does cancer research, but does the type of research that is being called into question here I want to make the point that it is not uncommon for different groups to have trouble reproducing results, especially if they’re not experienced with the protein or pathway they’re investigating. It took me 9 months to get a cell signaling assay working consistentally for a pathway that had been reported by multiple groups. I currently am working with a protein that I have been able to show a certain behavior reproducibly, for years, over dozens of preps, that my competitors can not get to work. I think it’s a little dangerous to do these fly by experiments, claim you can’t get them to work and then disparage the work as not reproducible. The is especially the case for “landmark” results that will no be put to the test by other labs that actually work in the field and will want to put these results in context with their own research.

    1. ^This. I spent about a year working out a protocol involving a particularly sensitive cell line. All of our experiments used positive and negative controls, all double blinded. So it was solid. Problem is that another lab didn’t go through the same pains, and couldn’t get any significant results- their controls showed exactly the same problems that we started with. So instead of cleaning up their methods so that negative and positive controls showed validity, they used a far less sensitive cell line, got null results, and claimed our work was incorrect.

      For Bailey, this would be “failure to replicate.” For me, it would be “failure to run your experiments carefully.”

Please to post comments

Comments are closed.