Was the New Coronavirus Accidentally Released from a Wuhan Lab?
The Washington Post Fact Checker provides a thorough discussion of the relevant facts concerning whether the COVID-19 pandemic is a result of a lab accident in Wuhan. By the time I reached the conclusion, I was convinced that it almost certainly is. And then I reached the article's conclusion: "The balance of the scientific evidence strongly supports the conclusion that the new coronavirus emerged from nature — be it the Wuhan market or somewhere else. Too many unexpected coincidences would have had to take place for it to have escaped from a lab."
This conclusion is puzzling. The video accompanying the article acknowledges that it is a coincidence that the disease emerged in the same city where coronavirus bats are studied. But the video and article point to a much bigger coincidence: The Wuhan Center for Disease Control & Prevention, where researchers engaged in bat research without using appropriate protective equipment, was "right around the corner" from the Wuhan Seafood Market, the location tied to multiple initial carriers of the virus--apparently, fewer than 300 meters.
Let's do some simple math. First, let's try to make concrete the point of Michel Trottier-McDonald that we've long expected the possibility of a pandemic, and so we should have a strong prior that a pandemic that actually occurs is a result of something other than an accident. Let's assume that labs studying bats are actually very safe, despite the safety violations documented by the Fact Checker, more specifically that the probability that a lab somewhere in the world causes a bat coronavirus pandemic in a given year is only 1 in 100,000. And let's assume that non-laboratory zoonosis was ex ante much more likely, with perhaps a 1/1,000 chance of producing a pandemic in a given year. So, with no other evidence, when a pandemic occurs, we can calculate P(Accident | Pandemic) = P(Accident) / (P(Accident) + P(Zoonosis)) = 0.0099. Moreover, let's suppose that there are 20 labs worldwide (just a wild guess, my guess is it's on the high side) that study coronaviruses and bats. So, the probability that any one lab is responsible, which we can denote P(WCDCP) when speaking of the Wuhan Center for Disease Control & Prevention, is approximately 0.0005. From this vantage point, the idea that the virus escaped from that lab does look like a conspiracy theory.
But we then need to account for the coincidence that the first location associated with the outbreak was in a 300 meter radius. As Some Guy comments in response to Trottier-McDonald, surely this should figure in the calculus. We can calculate the probability that this neighborhood would be the first location associated with the outbreak as follows: P(Neighborhood) = P(Neighborhood | WCDCP) * P(WCDCP) + P(Neighborhood | ~WCDCP) * P(~WCDCP). That is, the unconditional probability that a bat coronavirus pandemic would first be associated with this particular neighborhood is equal to the probability that this would occur assuming that the coronavirus originated in that lab, multiplied by the probability that the coronavirus originated in that lab, plus the probability that the pandemic would first be associated with this particular neighborhood assuming that the coronavirus did not originate in that lab, multiplied by the unconditional probability that the coronavirus did not originate in that lab. Let's suppose that there is 50% chance that an accident at the WCDCP would lead to the pandemic first being observed in the neighborhood. So, P(Neighborhood) = 0.5 * 0.0005 + P(Neighborhood | ~WCDCP) * 0.9995.
What about P(Neighborhood | ~WCDCP)? As a first approximation, let's assume that the probability of a pandemic starting in any location is roughly proportional to population density. Wikipedia reports population density of as much as 20,445 per square kilometer in the approximate area of WCDCP, so the circle with radius 0.3 km would have a population of about pi * 0.3 * 0.3 * 20,445 = 5,781. Because the world population is 7.8 billion, P(Neighborhood | ~WCDCP) = 5,781/7,800,000,000 = 7.41 X 10-7. But let's suppose, to err against the conspiracy theory side, that, lab aside, this neighborhood was much more likely to be the source of an outbreak than other random neighborhoods on earth with equal population density (even though the relevant bat species' habitat is far from Wuhan) -- say, 100 times, because it had a wet market that sold wild animals (though apparently not bats). Then P(Neighborhood | ~WCDCP) = 7.41 X 10-5. So, we can calculate P(Neighborhood) = 0.5 * 0.0005 + 7.41 X 10-5 * 0.9995 = 0.00032.
Now, we can apply Bayes' Theorem. We want to calculate P(WCDCP | Neighborhood) -- the probability that an accident at the WCDCP was the source of the pandemic given the evidence that the first cases were detected there. This will equal P(Neighborhood | WCDCP) * P(WCDCP) / P(Neighborhood), so 0.5 * 0.0005 / 0.00032 = 0.781. That is, with these conservative assumptions and this simple model, there is a 78.1% chance that the virus originated in the Wuhan lab.
This is a back-of-the-envelope calculation and hardly definitive. Ideally, one would like to take into account other evidence. For example, Chinese scientists report that they had no record of this coronavirus and that the most similar coronavirus they studied was only 96% similar. That doesn't seem all that relevant--couldn't an infection have occurred without the virus being detected and its genome decoded?--but it might move the needle a little bit. Other evidence, such as intelligence reports, might change our conclusions as well. But at the very least, it suggests that this is no mere conspiracy theory and that the Fact Checker ought to document its numerical assumptions. The Fact Checker understandably prefers hard facts to ruminations about coincidences. But if the Fact Checker is going to make a probabilistic assessment, it needs to think about coincidences, and it should focus on the most glaring coincidence (that the WCDCP was in the same neighborhood) rather than on the less remarkable one (that the Wuhan Institute of Virology, which apparently had much safer practices, was in the same city).
Of course, conspiracy theorists often thrive on irrelevant coincidences. What conspiracy theorists often forget is that there are so many possible coincidences that the occurrence of a few coincidences is not a coincidence. If you told me that the director of the lab and one of the victims had the same last name or enjoyed eating in the same restaurant, I wouldn't put much weight on that. But it at least seems to me that the location where the disease originated is not just one possible coincidence of many possible coincidences, but rather one of the first facts one would want to know in assessing the pandemic's origin.
Does all this matter? People are not very good Bayesians, but we should strive to think rigorously about probability in matters of public policy. I don't think that the likelihood of origination in a lab means that China needs to pay reparations. But if this pandemic did originate in a lab, that should lead the international community to pay close attention to how such labs operate and how they could be monitored. And while it seems to me highly unlikely that the virus release was intentional, origination in a lab might nonetheless reasonably focus attention on bioterrorism. Modern genetic technology certainly makes it more likely that scientists, intentionally or by accident, could create stronger strains of viruses. Now that we have seen what a naturally occurring virus can do, the possibility of a man-made virus worse than the novel coronavirus should terrify us, perhaps even posing an existential risk on the same order of magnitude as climate change.