Reupping:Why reproducibility initiatives are misguided

I’m reposting this two-year old piece, because it’s worth reminding ourselves why exact replication has, with minor exceptions, never been an important part of science:

In my latest Pacific Standard column, I take a look at the recent hand-wringing over the reproducibility of published science. A lot of people are worried that poorly done, non-reproducible science is ending up in the peer-reviewed literature.

Many of these worries are misguided. Yes, as researchers, editors, and reviewers we should do a better job of filtering out bad statistical practices and poor experimental designs; we should also make sure that data, methods, and code are thoroughly described and freely shared. To the extent that sloppy science is causing a pervasive reproducibility problem, then we absolutely need to fix it.

But I’m worried that the recent reproducibility initiatives are going beyond merely sloppy science, and instead are imposing a standard on research that is not particularly useful and completely ahistorical. When you see a hot new result published in Nature, should you expect other experts in the field to be able reproduce it exactly?

Not always. To explain why, I’ll hand the mic over to Chris Drummond, a computer scientist and research officer at Canada’s National Research Council:

“Replicability is not Reproducibility: Nor is it Good Science” (PDF)

At various times, there have been discussions arising from the inability to replicate the experimental results published in a paper… There seems to be a widespread view that we need to do something to address this problem, as it is essential to the advancement of our field. The most compelling argument would seem to be that reproducibility of experimental results is the hallmark of science…I want to challenge this view by separating the notion of reproducibility, a generally desirable property, from replicability, its poor cousin. I claim there are important differences between the two. Reproducibility requires changes; replicability avoids them. Although reproducibility is desirable, I contend that the impoverished version, replicability, is one not worth having.

Drummond goes on to explain:

A critical point of reproducing an experimental result is that irrelevant things are intentionally not replicated. One might say, one should replicate the result not the experiment…The sharing of all the artifacts from people’s experiments is not a trivial activity.

In practice, most of us implicitly make Drummond’s distinction between replication and reproduction: we avoid exact replication when it isn’t absolutely necessary, but we are concerned about reproducing the general phenomena in our particular system.

And sometimes well-done research won’t be very reproducible, because it’s on the cutting edge, and we may not understand all of the relevant variables yet. You see this over and over in the history of science – the early days of genetics and the initial discoveries of high energy rays come to mind here. Scientists should do careful work and clearly publish their results. If another lab comes up with a different result, that’s not necessarily a sign of fraud or poor science. It’s often how science makes progress.

And here are two more pieces I wrote on the subject:

The Cancer Reproducibility Project is Incredibly Naive, Probably Useless, and Potentially Damaging

Sloppiness vs Reproducibility

 


Filed under: Curiosities of Nature Tagged: reproducibility, scientific method

Sloppiness vs Reproducibility

I’m not a big fan of reproducibility projects. Shoddy papers shouldn’t be tolerated, but the truth is that sometimes rigorously done research isn’t reproducible — and when that happens, science gets interesting. It should go without saying that a peer-reviewed paper isn’t a guarantee of truth. If done properly, a paper is a record of a rigorous attempt to discover something about the world, no more, no less. What we believe about nature should reflect the accumulated evidence of many researchers and many papers, and that means the scientific literature should reflect our latest tentative, bleeding-edge thinking, even at the risk of being wrong. It’s counterproductive to hold up publication until some other lab reproduces your result, or to retract papers that don’t hold up, unless they had clear methodological flaws or artifacts that should have been caught in review.

Two recent articles capture what I think is the right attitude on reproducibility. First, as David Allison and his colleagues write, as a community of researchers, editors, and reviewers, we’re not doing as well as we should be when it comes to meeting high standards for best statistical and other methodological practices:

 In the course of assembling weekly lists of articles in our field, we began noticing more peer-reviewed articles containing what we call substantial or invalidating errors. These involve factual mistakes or veer substantially from clearly accepted procedures in ways that, if corrected, might alter a paper’s conclusions.

There is no excuse for this kind of sloppiness.

On the other hand, here is Columbia’s Stuart Firestein:

The failure to replicate a part or even the whole of an experiment is not sufficient for indictment of the initial inquiry or its researchers. Failure is part of science. Without failures there would be no great discoveries.

So yes, let’s clean up science by rooting out obvious “invalidating practices” that all too often plague papers in journals at all tiers. But let’s not be naive about how science works, and what the scientific literature is supposed to be. To paraphrase what  I wrote recently, if some of our studies don’t turn out to be wrong, than we’re not pushing hard enough at the boundaries of our knowledge.


Filed under: Curiosities of Nature Tagged: scientific method

How to advance science by failure

Stewart Firestein has a provocative piece in Nautilus on the role of failing well in science:

As your career moves on and you have to obtain grant support you naturally highlight the successes and propose experiments that will continue this successful line of work with its high likelihood of producing results. The experiments in the drawer get trotted out less frequently and eventually the drawer just sticks shut. The lab becomes a kind of machine, a hopper—money in, papers out.

My hope of course is that things won’t be this way for long. It wasn’t this way in the past, and there is nothing at all about science and its proper pursuit that requires a high success rate or the likelihood of success, or the promise of any result. Indeed, in my view these things are an impediment to the best science, although I admit that they will get you along day to day. It seems to me we have simply switched the priorities. We have made the easy stuff—running experiments to fill in bits of the puzzle—the standard for judgment and relegated the creative, new ideas to that stuck drawer. But there is a cost to this. I mean a real monetary cost because it is wasteful to have everyone hunting in the same ever-shrinking territory…

How will this change? It will happen when we cease, or at least reduce, our devotion to facts and collections of them, when we decide that science education is not a memorization marathon, when we—scientists and nonscientists—recognize that science is not a body of infallible work, of immutable laws and facts. When we once again recognize that science is a dynamic and difficult process and that most of what there is to know is still unknown.


Filed under: Items of Interest Tagged: Linkonomicon, scientific method

The hard…is what makes it great

There are a lot of things to love in this piece from Christie Aschwanden about why retractions, studies that don’t hold up to reproduction, and even sub-fraudulent “p-hacking” do not mean that science is broken, but it is, simply, very hard. Among those things are the great visuals from Ritchie King – including a fun “p-hacking” demonstration tool.

For me, the real take home message goes beyond the “science is hard” catchphrase. Science isn’t just hard in the way implied by Tom Hanks’ Jimmy Duggan character in A League of Their Own:

It’s supposed to be hard. If it wasn’t hard, everyone would do it. The hard… is what makes it great.

Contrary to the rhetoric that would portray “science is hard” as an endorsement of success over a monumentally difficult task, this is not the point.

As Ashwanden addresses, science is hard because it is messy and complicated and requires a communal effort from members of a species that is only dubiously social outside of relatively narrow local groups.

If we’re going to rely on science as a means for reaching the truth — and it’s still the best tool we have — it’s important that we understand and respect just how difficult it is to get a rigorous result.

There are things like sampling variance and mistakes and uncontrollable environmental variables and resource limits and the fabled “orthologous methods” that inject all sorts of inconsistency and challenges into the textbook scientific method. This is why the great philosophers of science* spoke about disproof rather than proof, about independent reproducibility, about probability rather than certainty.

These issues do not indicate that science is broken. There simply is no other way it could work in the hands of mere humans. What may be broken is the way we perceive science. We need to understand that it is a gradual and a community effort. We need to understand that our mythos of science – of the great, usually in the stories, man performing a great experiment and making a great discovery – are almost always false summaries which are convenient and inspiring, but do not represent why science is truly hard.

*It is also why those who dismiss the philosophy of science as a waste of time – I’m looking at you Neil DeGrasse Tyson – deserve nothing but the most vigorous of side-eyes on that point.


Filed under: Follies of the Human Condition Tagged: Christie Aschwanden, FiveThirtyEight, Linkonomicon, retractions, Ritchie King, science, scientific method, Scientific Process

The Cancer Reproducibility Project is Incredibly Naive, Probably Useless, and Potentially Damaging

I’ve always thought the Reproducibility Project represented an incredibly naive approach to the scientific method. This excellent news piece in Science sums up many of the reasons why. As Richard Young says in the piece, “I am a huge fan of reproducibility. But this mechanism is not the way to test it.” Here’s why:

1) Reproducibility in science is not achieved by having a generic contract research organization replicate a canned protocol, for good reason: cutting edge experiments are often very difficult and require specialized skills to get running. Replication is instead achieved by other labs in the field who want to build on the results. Sometimes this is done using the same protocol as the original experiment, and sometimes by obtaining similar results in a different system using a different method.

2) For this reason, I don’t have much confidence that the results obtained by the Reproducibility Project will accurately reflect the state of reproducibility in science. A negative result could mean many things — and most likely it will reflect a failure of the contract lab and not an inherent problem with the result. Contrary to the claims of the projects leaders, the data produced by the Project will probably not be useful to people who are serious about estimating the scope of irreproducibility in science. At its worst, it could be extremely misleading by painting an overly negative picture of the state of science. It’s already been damaging by promoting a too-naive view of how the process of successful science actually works.

3) As the Science piece points out, there is a much better, cheaper, and scientifically sensible way to achieve better reproducibility. If many papers out there are suspect because they lack proper controls, don’t use validated reagents, fail to describe methods adequately, or rely on flawed statistics, then we don’t need to spend millions of dollars and thousands of hours of effort trying to repeat experiments. We need to make sure editors and reviewers require proper controls, reagents, statistics, and full methods descriptions.

It’s worth reading the full article, but below the fold are some salient quotes:

[Richard Young] says that if the project does match his results, it will be unsurprising —the paper’s findings have already been reproduced. If it doesn’t, a lack of expertise in the replicating lab may be responsible. Either way, the project seems a waste of time, Young says. “I am a huge fan of reproducibility. But this mechanism is not the way to test it.”

I like the concept,” says cancer geneticist Todd Golub of the Broad Institute in Cambridge, who has a paper on the group’s list. But he is “concerned about a single group using scientists without deep expertise to reproduce decades of complicated, nuanced experiments.”

Early on, Begley, who had raised some of the initial objections about irreproducible papers, became disenchanted. He says some of the papers chosen have such serious flaws, such as a lack of appropriate controls, that attempting to replicate them is “a complete waste of time.” He stepped down from the project’s advisory board last year.

Amassing all the information needed to replicate an experiment and even figure out how many animals to use proved “more complex and time-consuming than we ever imagined,” [proejct leader] Iorns says.

For many scientists, the biggest concern is the nature of the labs that will conduct the replications. It’s unrealistic to think contract labs or university core facilities can get the same results as a highly specialized team of academic researchers, they say. Often a graduate student has spent years perfecting a technique using novel protocols, Young says. “We brought together some of the most talented young scientists in the area of gene control and oncology to do these genomics studies. If I thought it was as simple as sending a protocol to a contract laboratory, I would certainly be conducting my research that way,” he says.

Academic labs approach replication differently. Levi Garraway of the Harvard University–affiliated Dana-Farber Cancer Institute in Boston, who also has two papers on the project’s list, says that if a study doesn’t initially hold up in another lab, they might send someone to the original lab to work side by side with the authors. But the cancer reproducibility project has no plans to visit the original lab, and any troubleshooting will be limited to making sure the same protocol is followed, Errington says. Erkki Ruoslahti of the Sanford-Burnham Medical Research Institute in San Diego, California, has a related worry: The lab replicating one of his mouse experiments will run that experiment just one time; he repeated it two or three times.

The scientists behind the cancer reproducibility project dismiss these criticisms.


Filed under: Curiosities of Nature Tagged: scientific method

Why reproducibility initiatives are misguided

In my latest Pacific Standard column, I take a look at the recent hand-wringing over the reproducibility of published science. A lot of people are worried that poorly done, non-reproducible science is ending up in the peer-reviewed literature.

Many of these worries are misguided. Yes, as researchers, editors, and reviewers we should do a better job of filtering out bad statistical practices and poor experimental designs; we should also make sure that data, methods, and code are thoroughly described and freely shared. To the extent that sloppy science is causing a pervasive reproducibility problem, then we absolutely need to fix it.

But I’m worried that the recent reproducibility initiatives are going beyond merely sloppy science, and instead are imposing a standard on research that is not particularly useful and completely ahistorical. When you see a hot new result published in Nature, should you expect other experts in the field to be able reproduce it exactly?

Not always. To explain why, I’ll hand the mic over to Chris Drummond, a computer scientist and research officer at Canada’s National Research Council:

“Replicability is not Reproducibility: Nor is it Good Science” (PDF)

At various times, there have been discussions arising from the inability to replicate the experimental results published in a paper… There seems to be a widespread view that we need to do something to address this problem, as it is essential to the advancement of our field. The most compelling argument would seem to be that reproducibility of experimental results is the hallmark of science…I want to challenge this view by separating the notion of reproducibility, a generally desirable property, from replicability, its poor cousin. I claim there are important differences between the two. Reproducibility requires changes; replicability avoids them. Although reproducibility is desirable, I contend that the impoverished version, replicability, is one not worth having.

Drummond goes on to explain:

A critical point of reproducing an experimental result is that irrelevant things are intentionally not replicated. One might say, one should replicate the result not the experiment…The sharing of all the artifacts from people’s experiments is not a trivial activity.

In practice, most of us implicitly make Drummond’s distinction between replication and reproduction: we avoid exact replication when it isn’t absolutely necessary, but we are concerned about reproducing the general phenomena in our particular system.

And sometimes well-done research won’t be very reproducible, because it’s on the cutting edge, and we may not understand all of the relevant variables yet. You see this over and over in the history of science – the early days of genetics and the initial discoveries of high energy rays come to mind here. Scientists should do careful work and clearly publish their results. If another lab comes up with a different result, that’s not necessarily a sign of fraud or poor science. It’s often how science makes progress.


Filed under: Curiosities of Nature Tagged: scientific method

Tortoise vs Hare, but in a vacuum this time

We all know how gravity is supposed to work. Without air resistance, a feather and a bowling ball (the standardized materials for all gravitational tests) should accelerate toward the center of the Earth at the same rate, thus striking the ground at the same time. Humans have tested this. It works.

Although we know this thing, it is so far removed from our daily experience that it is still stunning to watch it happen. This fundamental principle is nicely illustrated in this video from the BBC. The video also nicely shows how amazed a roomful of individuals who know how the experiment will work can be when the experiment works exactly as expected.

That is why we need the scientific method to rigorously test hypotheses and incrementally build our knowledge of how the universe works. Our day-to-day experience of and intuition about the world is extremely valuable, but also extremely deceptive.

For the record, the tortoise vs hare in a vacuum race I alluded to in the title would be incredibly inhumane and disappointing, in addition to having no winner – unless, UNLESS we had the tortoise and hare race in spacesuits. Why aren’t we racing animals in spacesuits?

HT: Jared Heidinger


Filed under: Curiosities of Nature Tagged: BBC, Brian Cox, gravity, Jared Heidinger, Physics, scientific method

The Science of Weezer

On the 537th episode of the WTF with Marc Maron Podcast, Marc Maron has an interesting conversation with Rivers Cuomo* of Weezer about his method for songwriting, particularly in the gap between Pinkerton (1996) and The Green Album (2001).

What I find so captivating is Cuomo’s application of a scientific mindset to “solving” his creative process  in the hopes of working more efficiently and effectively. He fails, but does not conclude that his art cannot be understood by science. His problem was a classic scientific problem of too many variables, too small of a sample size (ie, n=1), and too little time. Cuomo also defies Maron’s efforts to portray his analytical quest as potentially maddening. It simply wasn’t productive enough.

I’m going to recommend the whole interview, but the segment I have described starts at about the 34:50 mark.

*Promoting Weezer’s new album Everything Will Be Alright in the End.


Filed under: Follies of the Human Condition Tagged: Linkonomicon, Marc Maron, Rivers Cuomo, scientific method, Weezer, WTF, WTF with Marc Maron Podcast

What is misconduct?

The results of a small survey of graduate students and post-docs suggest that our research trainees don’t really know what research misconduct is below the level of flat-out fabrication.

However, we were dismayed that only 54 per cent gave a three to “knowingly selecting only those data that support a hypothesis” and 42 per cent to “deleting some data to make trends clearer”. The naivety is staggering. – Tim Birkhead & Tom Montgomerie

They also note that these individuals face considerable barriers to reporting misconduct when they believe it has occurred.

I recall the mandatory ethics class we took at Washington University in St. Louis. It was worthless. I recall spending a great deal of time talking about “salami science”. Salami science is the practice of parceling your work out into as many paper with as little unique content each as possible. This is bad behavior that games some of the systems used to evaluate researchers. It does not, however, corrupt the scientific results with inaccurate data and results.

While I received my training in proper, scientific conduct in my thesis lab, that is not a sustainable solution. The future of scientific investigation should not depend on the efforts of individual thesis mentors – they are simply too inconsistent. Ethics education is key to training in the proper implementation of the scientific method and should be central to all aspects of graduate training, including the development of quality courses that provide real training in ethics and identifying misconduct.


Filed under: Follies of the Human Condition Tagged: fraud, misconduct, science, scientific method, Times Higher Education

Modern Statistics make the absurd appear significant

This is too good not to share, from a preprint by Andrew Gelman and Eric Loken, “The garden of forking paths: Why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time”

Without modern statistics, we find it unlikely that people would take seriously a claim about the general population of women, based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.

The paper is here (PDF).


Filed under: Curiosities of Nature Tagged: scientific method, statistics