I recently came across an interesting article in Science News on widespread replication failure in cancer studies. It’s interesting, though not particularly shocking, that the Replication Crisis has claimed one more field.
If you’re not familiar with the Replication Crisis, it has to do with how it was widely assumed that scientific experiments described in peer-reviewed journals were reproducible—that is, if someone else performed the experiment, they would get the same result. Reproducibility of experiments is the foundation of trust in the sciences. The theory is that once somebody has done the hard work of designing an experiment which produces a useful result, others can merely follow the experimental method to verify that the result really happens and that after an experiment has been widely reproduced, people can be very confident in the result because so many people have seen it for themselves and we have widespread testimony of it. Or, indeed, people can perform these experiments as they work their way through their scientific education.
That’s the theory.
Practice is a bit different.
The problem is that science became a well-funded profession. The consequence is that experiments became extraordinarily expensive and time-intensive to perform. The most obvious example would cloud-chamber experiments in super-colliders. The Large Hadron Collider cost somewhere around $9,000,000,000 to build and requires teams of people to operate. Good luck verifying the experiments it performs for yourself.
Even when you’re on radically smaller scales and don’t require expensive apparatus—say you want to assess the health effects of people cutting out coffee from their diet—putting together as study is enormously time-intensive. And it costs money to recruit people; you generally have to pay them for their participation, and you need someone skilled in periodically assessing whatever health metrics you want to assess. Blood doesn’t draw itself and run lipid panels, after all.
OK, so amateurs don’t replicate experiments anymore. But what about other professionals?
Here we come to one of the problems introduced by “Publish Or Perish”. Academics only get status and money for achieving new results. For the most part people don’t get grants to do experiments that other people have already done and get the same results that they got. This should be a massive monkey wrench in the scientific machine, but for a long time people ignored the problem and papered over it by saying that experiments will get verified when other people try to build on the results of previous experiments and fail.
It turns out that doesn’t work, at least not nearly well enough.
The first field in which people got serious funding to try to actual replicate results to see if they replicate was in psychology, and it turned out that most wouldn’t replicate. To be fair, in many cases this was because the experiment was not well-described enough that one could even set up the same experiment again, though this is, to some degree, defending oneself against a charge of negligence by claiming incompetence. Of those studies which were described well enough that it was possible to try to replicate them, something like less than half replicated. They tended to fail to replicate in one of two ways:
- The effect didn’t happen often enough to be statistically significant
- The effect was statistically significant but so small as to be practically insignificant
To give a made-up example of the first, if you deprive people of coffee for a few months and one out of a few hundred see a positive result, then it may well be you just chanced onto someone who improved for some other reason while you were trying to study coffee. To give an example of the second, you might get a result like everyone’s systolic blood pressure went down by one tenth of a millimeter of mercury. There’s virtually no way you got a result that common in the group by chance, but it’s utterly irrelevant to any reasonable goal a human being can have.
Psychology does tend to be a particularly bad field when it comes to experimental design and execution, but other fields took note and wanted to make sure that they were as much better than the psychologists as they assumed.
And it turned out that many fields were not.
I find it interesting, though not very surprising, that oncology turns out to be another field in which experiments are failing to replicate. After all, in a field which isn’t completely new, it’s easier to get interesting results that don’t replicate than it is to get interesting results that do.