Seemingly Small Differences in the Accuracy of COVID-19 Antibody Tests Can Make a Big Practical Difference

When infection prevalence is low, a test with relatively low specificity can generate highly misleading results.


We are counting on COVID-19 antibody tests to estimate the prevalence and lethality of the novel coronavirus and to identify people who were infected but now may be immune. Are the tests up to those tasks? That depends on which test you use, how you use it, and the amount of risk you are prepared to accept.

Dozens of different tests are currently available, and their accuracy varies widely. Evaluate Vantage's Elizabeth Cairns looked at 11 tests and found that their reported sensitivity (the percentage of positive samples correctly identified as positive in validation tests) ranged from 82 percent to 100 percent, while their reported specificity (the percentage of negative samples correctly identified as negative) ranged from 91 percent to 100 percent.

A recent study by the COVID-19 Testing Project evaluated 12 antibody tests and found a wider specificity range, from 84 percent to 100 percent. Most of the tests had specificities higher than 95 percent, while three had rates higher than 99 percent.

In the Evaluate Vantage survey, the best bets for sensitivity, based on numbers reported by the manufacturers, were Abbott's SARS-CoV-2 IgG Test and Epitope's EDI Novel Coronavirus COVID-19 IgG ELISA Kit. For specificity, the latter test and another Epitope product, the EDI Novel Coronavirus Covid-19 IgM ELISA Kit, did best, along with Creative Diagnostics' SARS-CoV-2 Antibody ELISA and Ortho-Clinical's Vitros Immunodiagnostic Product Anti-SARS-CoV-2 Total Reagent Pack. Only Epitope's EDI Novel Coronavirus COVID-19 IgG ELISA Kit had perfect scores on both measures, although Abbott's kit came close, with a sensitivity of 100 percent and a specificity of 99.5 percent.

Even when a test has high sensitivity and specificity, its results can be misleading. In the context of studies that seek to measure the prevalence of the virus in a particular place, for example, even a low false-positive rate could generate substantial overestimates when the actual prevalence is very low.

Suppose researchers screen a representative sample of 1,000 people in a city with 2 million residents. Leaving aside the issue of sampling error, let's assume the actual prevalence in both the sample and the general population is 5 percent.

If a test has a sensitivity of 100 percent and a specificity of 99.5 percent (the rates reported by Abbott), the number of false positives (0.5 percent times 950 people) will be about one-tenth the number of true positives (50). The estimated number of local infections (110,000) would then be only 10 percent higher than the actual number of infections (100,000). But if the actual prevalence is 1 percent, a third of the positive results will be wrong, resulting in a bigger gap between the estimate and the actual number: 30,000 vs. 20,000—a 50 percent difference.

Now suppose the antibody test has a specificity of 90 percent (similar to the rate reported by BioMedomics, which supplied the tests used in a recent Miami-Dade County antibody study). If the true prevalence is 5 percent, false positives will outnumber true positives, and the estimated number of infections will be about three times as high as the actual number. Unless the researchers adjust their results to take the error rate into account, that's a big problem. It's an even bigger problem if the true prevalence is 1 percent, in which case false positives will outnumber true positives by about 10 to 1.

As long as the test has high specificity and infection prevalence is relatively high (and assuming the samples are representative), antibody studies should generate pretty accurate estimates. But that won't be true when specificity is relatively low or prevalence is very low unless the researchers have a good idea of the test's error rate and adjust their data accordingly.

What about using antibody tests to figure out who is immune to COVID-19? It's reasonable to believe, based on the experience with other viruses, that antibodies confer at least some immunity. That is, after all, the premise underlying all the fevered efforts to develop a COVID-19 vaccine. But the extent and longevity of such immunity is not yet clear.

If you had symptoms consistent with COVID-19 at some point, you might want an antibody test to confirm your suspicion, even if you tested negative for the virus itself (since those tests may have a substantial false-negative rate). You might also want an antibody test if you were exposed to someone with COVID-19, or simply because you could have been infected without realizing it, since asymptomatic infection seems to be common.

This week Quest Diagnostics began offering COVID-19 antibody tests through an online portal for $119 each. After ordering the test, you make an appointment for a blood draw at one the company's 2,200 patient service centers. The results are available online within three to five business days.

Quest notes that "it usually takes around 10 to 18 days to produce enough antibodies to be detected in the blood." Hence the test is not recommended for people who currently are experiencing symptoms, who tested positive for the virus in the last seven days, or who were directly exposed to COVID-19 in the last 14 days.

Quest also notes that the test "can sometimes detect antibodies from other coronaviruses, which can cause a false positive result if you have been previously diagnosed with or exposed to other types of coronaviruses." How often does that happen? Although Quest's test was not included in the Evaluate Vantage survey, the company reports a specificity of "approximately 99% to 100%."

Quest likewise warns that "negative results do not rule out SARS-CoV-2 infection." It reports a sensitivity of "approximately 90% to 100%."

Those numbers indicate that Quest's specificity is very high—comparable to the figures reported by Abbott, CTK Biotech, Nirmidas Biotech, Premier Biotech, and SD Biosensor, although perhaps not quite as good as the rates reported by Creative Diagnostics, Epitope, and Ortho-Clinical Diagnostics. Quest's reported sensitivity covers a pretty wide range but still makes its test look better by that measure than Epitope's IgM test and the products offered by BioMedomics, Ortho-Clinical, and SD Biosensor.

In her Evaluate Vantage article, Cairns emphasizes that reported accuracy rates have not been confirmed by any regulatory agency. While Abbott and Becton Dickinson (which is collaborating with BioMedomics) "are reputable companies" that are "highly unlikely to make claims they cannot justify," she says, "many of the other antibody tests on sale around the world are from little-known groups and laboratories that might not be so scrupulous." She also points out that "the validation tests these companies have performed varied widely in size," ranging from about 100 samples to more than 1,000.

As with the antibody studies, the actual prevalence of the virus affects the usefulness of these tests for individuals. If the share of the population that has not been infected is very large and the specificity of the test is relatively low, false positives can outnumber true positives, meaning that someone who tests positive probably is not immune. Cairns makes that point in terms of a test's positive predictive value: the likelihood that any given positive result is accurate.

"The prevalence of Covid-19 is estimated at around 5% in the US, and at this low a level the risk of false positives becomes a major problem," Cairns writes. Assuming that prevalence, a test with 90 percent specificity would generate about twice as many false positives as true positives, meaning only about a third of the positive results will be correct. "A test with 95% specificity will lead to a 50% chance that a positive result is wrong," Cairns notes. "Only at 99% specificity does the false positive rate become anywhere near acceptable, and even here the chances are that 16% of positive results would be wrong." With a specificity of 99.5 percent (Abbott's reported rate, which is similar to Quest's), the chance that a positive result will be wrong falls to less than 9 percent.

These considerations are obviously relevant for policy makers as they decide who should be allowed to work (or travel internationally), where, and under what restrictions. Seemingly small differences in specificity can make a big difference when it comes to identifying people who are presumably immune to COVID-19.

NEXT: The Logan Act Doesn't Justify Mike Flynn's Prosecution—It Further Politicizes It

Coronavirus Epidemics Epidemiology Public Health

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

Please to post comments

65 responses to “Seemingly Small Differences in the Accuracy of COVID-19 Antibody Tests Can Make a Big Practical Difference

  1. I am beginning to suspect with COVID-19 that we don’t know a whole lot of anything for certain.

    1. Lane Rose, Make 6150 bucks every month… Start doing online computer-based work through our website. I have been working from home for 4 years now and I love it. I don’t have a boss standing over my shoulder and I make my own hours. The tips below are very informative and anyone currently working from home or planning to in the future could use this website… More Read Here

    2. Change Your Life Right Now! Work From Comfort Of Your Home And Receive Your First Paycheck Within A Week. No Experience Needed, No Boss Over Your Shoulder… Say Goodbye To Your Old Job! Limited Number Of Spots Open…
      Find out how HERE……
      More here

  2. Did I miss the part in the Constitution that gives policy makers the authority to decide who gets to work or travel?

    1. As usual, you can look to the commerce clause. Not even kidding in this case.

      1. Commerce Clause only applies to the Feds, and I don’t think they’ve shut down anything other than federal activities. Police power is what the states and local governments are using.

      2. Ironically, most leftists are going to the 10th Amendment. Suddenly, they are all about state’s rights, [the power not invested in the federal government belongs to the states {let’s not mention also to the people}] and that somehow therefore translates into an authoritarian governor and a suspension of the Constitution at the whim of the Governor. Now were Obama, Clinton, or Biden in office, they’d be happy as peach ice cream to abandon the 10th again and focus on how we need a central community organizer with the powers of Stalin to save the day. They just don’t want that with Trump, so they are left to invoke the 10th.

        That said, it’s also an election year and perhaps this crisis peaked too early to take it to the polls. People tend to be willing to elect and allow power to authoritarian leftists when being actively served a steaming pile of fear. However, now a lot of people are taking a look at the numbers and wanting to know where these numbers are coming from. And they are calling bullshit after realizing that half of the cases come from two states, the hospitals that were destined to collapse are actually collapsing from lack of use, and they are threatened with arrest if they buy paint, ride in a boat, take the family to the beach, or have a beer with a friend.

        1. I love how people think that you can pick an amendment to trump another.

          The Bill of Rights; each right is supreme to itself.

          You can not abridge the 1st amendment by using the 10th amendment.

          All these orders are completely illegal in the face of the 1st and 4th amendment.

  3. Serious question: if a patient has antibodies derived from a non-SARS-CoV-2 infection that react with the antigen in the test, will those same antibodies cross-react in vivo? If so, does that create a basal level of immunity?

    1. Umm, excuse me sir, but you seem to have come to the wrong meeting. However, you’re welcome to stay if you cut the gibberish and pop open a beer! Or a big fat blunt.

    2. As implied by the name SARS-CoV-2 is a particular variant of coronavirus. So while not identical to others, it certainly will share many similarities.

      It is absurd to think that complement reacts in toto to any particular foreign matter. More that antibodies respond to certain unique characteristics, be they exterior, or interior aspects of the virus. So it is quite likely that some, if not many, people have immune systems that will react, if only partially, to this particular virus.

      But my own guess, is that no it does not confer significant immunity in most cases, because, given the ubiquity of coronaviruses in general, if it did then we would not be seeing the problem we have right now.

      Although, to argue otherwise, the very sorts of people who would tend to have immune systems routinely challenged by common cold type viruses – the young, the socially active, street people, prison population, etc. are the very sorts people who, while still becoming infected, are also not the the sorts who are then dying from their infections.

    3. Single mom makes $89844/yr in her spare time on computer without selling or buying any thing. I got inspired and start work now i am making $175 per hour. Its to easy to do this, every one can do this no experience or skill required just join the given link and start earning from very first day. Here is link… More Read Here

  4. Are the tests up to those tasks? That depends on which test you use, how you use it, and the amount of risk you are prepared to accept.

    What’s this “you”, white man?

  5. This is all true but it has to be put into perspective. The antibody samples that have been done show an infection rate 10 times or more the confirmed infection rate (the number of people who have actually tested positive). So lets say these tests are awful and only have an 80% specificity rate. That means the actual is 8 times higher than known instead of ten. Eight times higher is still very significant and high enough to completely change our perception of the deadliness of this disease. This is all nice picking at the edges of the data but the data seems so overwhelming that I don’t see how it changes the conclusion.

    1. I’m not sure you understand what “80% specificity rate” means. Note that sensitivity and specificity are different things. Sensitivity is the TRUE POSITIVE rate, i.e., how many people who are positive actually have a positive test result (which matters more when you’re dealing with tests for a relatively rare condition). Specificity is the TRUE NEGATIVE rate, i.e., the number of people who are actually negative and get a negative test result.

      There are four possible cases for testing: true positives, false positives, true negatives, and false negatives. To determine what happens with an inaccurate test, you need to look at all four numbers.

      The worst-case scenario numbers mentioned in the article of a test with 82% sensitivity or 91% specificity. Just to make the numbers easier, let’s just assume a test with 90% for both. (These are much better than your hypothetical “awful” test.)

      The current rate of confirmed positive cases in the U.S. is less 0.3% of the population. But let’s just, for the sake of argument, say 1% of the population has been infected, over 3 times the confirmed rate.

      Now, let’s run our hypothetical 90% sensitivity/90% specificity test on a group of 10,000 random people. The actual number of infected in that population is 100, according to our assumptions (1%). That means that 9,900 people we test will be actually negative.

      Let’s compare the four possibilities. The number of true positives is 90, because 100 positive people are present, and the sensitivity is 90%. But, because the sensitivity is 90%, that means that there’s a 10% chance of showing up a false positive for everyone who takes the test. There are 9,900 negative people taking the test, and 10% of them will test positive. Thus, there are 990 false positives.

      The specificity is also 90%, so true negatives are 90% of the 9,900 negative people or 8,910. False negatives are those who are positive but test negative. Those are the 10 people out of the 100 positive people who didn’t get a positive result above.

      True positive: 90
      False positive: 990
      True negative: 8910
      False negative: 10

      Actual incidence of disease in population: 1%. Apparent incidence according to this 90% “accurate” test: 10.8%
      Factor the test overestimates infections by? 10.8 times. Chance that a positive test is actually a correct one? true positives (90) divided by total positives (1080) = 8.33%

      For this test, the vast majority of positives are false positives. So no, an 80% specificity rate doesn’t mean “the actual is 8 times higher instead of 10” — it likely means you’re generating so many false results that you can’t separate the true date from the noise.

      Suppose you raise both sensitivity/specificity to 95% and keep the 1% actual infection rate. In that case, you’re still going to get 495 false positives compared to 95 true positives, and you’ll still overestimate the prevalence of the disease by nearly a factor of 6.

      This is the problem with testing for rare conditions. You need a REALLY high sensitivity to get any meaningful results, in this case at least 99%. If some places are really using tests with sensitivities less than 90%, they’re producing absolute garbage data that shouldn’t be making headlines.

      1. Well presented.

        Now also consider what happens when you have multiple different tests, with differing sensitivity and specificity. Will people really understand the issues created when aggregating such data sets?

        The howls for testing, ignoring that we largely have not had good tests available, have mostly been nothing more than howls for a magic wand.

    2. Your math is wrong on that. If the specificity is only 80%, then 20% of true negatives were detected as positives (ie, if you sampled 1000 people and there were 50 true infections, 80% specificity would get you another 170 positives in addition to the true infections. ie, you’d detect more than 4x as many infections as there actually were).

      Fortunately the specificity of the tests used for the studies I’m aware of are significantly higher than 80%.

    3. Or, to maybe make this a bit more simple to understand, let’s look at the actual test rates that seemed to come out of the first California tests a week or two ago. They found 3.5% positives in their tests (raw data, which they later adjusted upward, based on demographic guesses, etc.).

      Okay, so they seemed to assume that the test they were using had a sensitivity of 99.5%. In that case, to get 3.5% positives, the actual rate of positives in the group they tested would have to be around 3%.

      But other calibrations of that test suggested a sensitivity that was lower. If the sensitivity drops to 99%, the actual infected rate drops to ~2.5% to produce a 3.5% measured rate (that includes false positives). Some data suggested that test actually had sensitivity as low as 95%, in which case literally 0 people could have been infected, and the test would have shown a 5% infection rate.

      There’s a big difference between 0% and 3.5%. Obviously, the number is not likely to be 0, but it shows how important test statistics are. It’s difference between, “This test demonstrates the actual rate of infection is 50 times higher than the confirmed numbers!” and “This test is producing so much garbage data that we can’t even tell if the infected cases are much higher than confirmed.”

      1. Oops — The Santa Clara study claimed a measured incidence of ~1.5%, so that means all the numbers are even worse for those who want to tout this as evidence that the disease is much more widespread. (The 3.5% came from another study I believe.)

        Again, I hope that the results in these studies turns out to be true, and the IFR is low. But the data so far is still pretty crappy, as crappy (or more) than all those “predicted deaths by the end of the summer” models that keep flying up and down every week. Those models suffer from similar statistical flaws to antibody tests.

        1. What you do in these cases is to parameterize your results wrt prevalence to produce a series of curves that show the spread of possible results. That’s the best you can do with tests that have high type 1 and type 2 error rates.

          1. Yes, of course you do. (I didn’t want to get into all of the complications of this above, as I was trying to explain the basics using actual calculations to someone who clearly didn’t get it.)

            The problem is that in the tests so far, you still end up with huge uncertainty. The 95% confidence interval for the Santa Clara study might indicate anywhere from ~2-3 times the confirmed positive rate to ~200 times the confirmed positive rate. (This is taking into account not only the uncertainty in testing like false positives, but also the demographic adjustments made to the data and the assumptions made there, which could skew things even further.)

            Trumpeting such results as proof that the incidence is “50-75 times” the confirmed positive rate (as the authors of that study initially did) is beyond bad stats. Given how such studies may affect public policy (and how people, especially politicians, don’t understand statistics), it’s completely irresponsible.

  6. “.. as they decide who should be allowed to ..”

  7. This is why meta-analysis is the key to getting accurate results. Don’t rely on one result, or even a few. You have to look at dozens of studies. The more you add in, the less likely all results will be erroneous by the same factor if they have been performed truly independently of each other.

    1. Depends. If they all are using similar tests that might produce a large amount of false positives, then doing more tests doesn’t actually improve your accuracy.
      It’s like trying to design a sensor that would identify a roll of 6 on a die. But the sensor is poor and picks up rolls of 4, 5, or 6. Assuming the die is fair, the true rate of sixes should be about 1/6 (or ~17%). But this sensor will register about 50% as “positive” for the condition, as it has a high false positive rate.
      If different groups using this same faulty sensor in other places, it’s not going to magically make the number converge closer to 17%. It’s going to keep coming up around 50%.
      Not saying this is what’s happening — but just doing more studies doesn’t necessarily improve testing accuracy if the tests themselves are inaccurate.

    2. Meta-analysis are usually crap.

  8. Question from a curiously ignorant bystander: how do they calibrate these tests? What 100% accurate test, with zero false positives and false negatives, do they compare all these imperfect diagnostics again?

    1. No clinician relies on a single data point for diagnosis.

      “ Medicine is a science of uncertainty and an art of probability.”
      William Osler

    2. Do not mean to be obscure. They are comparing one thing to others. None of them are perfect.

      1. But how do they come up with different accuracy rates? There has to be some fundamental core way of identify the real true/false positives/negatives in order to compare to specific tests.

        1. There are different kinds of tests. The real lab tests they use for a true confirmed case are based on things like PCR, which literally checks against the known DNA of the virus. Negative control samples are probably mostly taken from old blood samples from several months ago, before the virus was known to spread.

          The problem with PCR and the “confirmed” CDC-level tests (with the invasive nasal swap) is that they take more time and effort to confirm, and they’re more expensive. These “antibody tests” being talked about now aren’t looking for active infections. They aren’t looking to find and verify presence of the actual virus. Instead, they look for blood markers of immune response. The problem with these is that the antibodies the body produces are often similar for similar types of viruses, so it’s hard to get a test that accurately selects only for those that would attach to this specific strain. Nevertheless, antibody tests can be run much more quickly and cheaply, so that’s why they’re trying to roll these out.

          But they can calibrate the “true positives” by using blood samples from those people who actually had the virus, as confirmed by a test that actually was designed to test for the virus itself (rather than the antibodies).

        2. There is not really one way to do it. You can establish criteria based on a positive PCR and clinical criteria. Then compare.

          You can just run a sample of known presumed negatives for the antibody test against a known positive population who meet the criteria. However there are sampling errors to look out for.

          You can also compare results against a so called gold standard, in this case ELISA but that may have an error rate as well.

          In practice as Oslers dictum states you go with probability. In prediction and population studies it becomes more difficult.

          In this case there are likely a large percent of people who have been exposed or had sub clinical infection. Now what is your control?

          That is the dilemma public health folks are dealing with.

          There are many costs to weigh. At some point you are losing with the shutdown. We cannot go on with this much longer.

    3. For specificity, they test samples from before covid-19 first infected humans, which should all be negatives. Any positives decrease the specificity.

  9. What we know, without any doubt, from the record is that healthy people <60 years old don't die from COVID-19.

  10. I just have to say that I’ve been reading this site for some time and finally registered an account to say thank you to the author of this article for publishing one of the first rational discussions of testing I’ve seen in the mainstream media (here or elsewhere).

    In the past few weeks, Reason has run several articles touting the results of these antibody tests without understanding statistics AT ALL. This article finally shows someone is bothering to use REASON to think about testing, instead of the ignorant kneejerk responses of “Wow! Look at how this disease is so prevalent and isn’t so bad at all!” I truly, truly hope that’s the case, but so far a lot of these tests are problematic from a basic stats standpoint, so it’s difficult to separate the good data from the noise.

  11. To add to the discussion. Here is how the testing might work in some individual cases:
    First, an infected person:
    The nasal swab is accurate but only reads positive during the time the person is actually infected. After that, as long as one has flight off the virus, it’s not in the mucus. So a later nasal swab isn’t a “false negative” because the nasal swab is only testing for active infections.
    But while that person is “sick” the antibody tests (blood tests) might be negative because the body hasn’t yet or is just starting to produce antibodies. A time period later, likely after the person has recovered, the antibody tests can yield a true positive, but it can also be wrong, of course, for whatever reason.

    Next case: a person never gets a nasal swab because his case is asymptomatic. It would make no sense to do a nasal swab because if this person had the illness, he fought it off and there are likely no viruses hanging out in his mucus; but, he has antibodies to it, so the blood tests read these.

    Also, the antibody tests have a “number” as a result… They are not just “yes or no” tests, and in this range there is usually “inconclusive” result. So maybe 0 to 1 is considered negative: 1 to 3 inconclusive, and 3 or more positive. My surmisal is that whatever other “stuff” that is creating the low positive/inconclusive numbers, when that is a little higher in an individual and it gets a test over 3, those are the false positives.

    Also, laboratory error is possible (oh, which vial was that blood from?)!

    Lastly, and this is a good thing – unlike herpes or HIV, when you fight off the (or a) coronavirus, it is basically gone from the body, you “beat” the illness. In a sort a vernacular, if you contracted AIDS 5 years ago, you still “have” AIDS; but if you had the flu 5 years ago you don’t “have” that strain if the flu (though you might still have antibodies or the ability to start making them quickly)

  12. Jacqueline J Wilcox Makes $140 to $180 consistently online work and I got $16894 in one month electronic acting from home.I am a step by step understudy and work essentially one to two or three hours in my additional time.Everybody will complete that obligation and monline akes extra cash by simply open this link…… Read More

  13. The author should do more of his usual good thinking! This article is more a recap of what others have written. Why doesn’t he talk about repeated testing of those samples that return positive results? A simple application of the Bayes theorem shows that the false positive rate in repeated testing goes to zero. Even if the prevalence is low, this is only you prior in the first test. The posterior from your first test becomes the prior for your second test etc. If test errors are independent, you will converge to the right result — and rather quickly at that.

    1. Good point. For a blood test it should be easy enough to split the samples and test repeatedly just a few times for something like a 5% specificity would reduce the error quite a bit.

      1. So I did a quick spreadsheet to show some concrete numbers for retesting positives keeping known true positives/false positive apart.

        The test was given 90% true positive rate and 10% false positive rate. The number of samples was 10,000 with actual prevalence values of 2%,4%,6%,8%, & 10%. After the first test the estimates of prevalence were (11.6%,13.2%,14.8%,16%,18%)…so that was testing the entire population of 10,000.

        After one iteration of retesting only the positive results the estimates were (2.6%,4.2%,5.8%,7.4%,9%).

        I did three iterations. The results get better for the lower prevalence cases but worse for the higher prevalence cases:

        2% true => 2.1% estimated
        10% true => 7.3% estimated

        1. You can do it analytically to get more precision when you iterate. I’m away from my desktop, but will try to post some calculations later.

          Also, what you care about are mostly false positives and that estimate improveS very quickly.

          1. You’re making an important (and probably unwarranted) assumption, namely that ALL false positives are due to random chance. Given how antibody tests actually work, that’s not always likely to be the case. One problem in creating a test in a case like this with high specificity is that the body produces a lot of similar antibodies, and closely-related viruses may cause the body to generate very similar antibodies. So, while some of the false positives may be due to random chance, others may be due to the fact that the test is actually picking up on people who were infected at some point with a completely different (but perhaps related) virus.

            And it may be quite difficult to see how much of the latter is happening without a lot of careful analysis and study of why the test is producing false positives in the first place. Given that we barely have vaguely reliable numbers for sensitivity/specificity so far for these tests, I’m pretty sure there’s not enough evidence to claim that all false positives are due to random chance.

            Simple example: suppose you have some sort of scanner that tests for eye color. Suppose you’re looking for green eyes, and the population you’re studying has ~1% incidence of green eyes. But — unbeknownst to you — your scanner not only shows a positive result for green eyes. It also shows a positive result for blue eyes. Let’s say there are 4% blue eyes in this population, and the other 95% have brown eyes.

            Initial testing of your samples indicates a ~10% false positive rate. Now, if you don’t know that blue-eyed people are what’s causing most of the false positives, you can’t improve the tests. And repeated testing in this case won’t get you anywhere near the 1% true incidence of green eyes. Instead, the number will settle out around 5%, because you’ve gotten rid of some random false positives, but you can’t eliminate the false positives caused by a test that is not adequately selective in some unknown way.

            1. The burden of proof is for you to show there is some systemic bias in the testing system that would cause the errors to have some dependency like the one you describe. Either way there will be random, independent errors from test to test and iterating on the positives will improve the estimate.

              1. Exactly!!

              2. From the article all of these comments are discussing:
                “Quest also notes that the test ‘can sometimes detect antibodies from other coronaviruses, which can cause a false positive result if you have been previously diagnosed with or exposed to other types of coronaviruses.'”
                As I said in my previous post, this is well-known problem in antibody testing. Yes, you’re correct that you can’t know a *particular* test is suffering from this problem or how big that effect may be. However, given that it’s a well-known problem, and COVID-19 is already known (in multiple studies) to have similarities to other viruses in ways that could generate these problems, it’s very likely that tests are suffering from this problem in false positives. Which is why Quest probably issued this warning.
                And sorry, but the burden of proof on anyone offering a statistical model is to justify the validity of said model. You can’t just assume a characteristic is entirely randomly distributed. It’s a starting point when you have no other data, but that doesn’t mean the assumption is more likely to be true.
                “Either way there will be random, independent errors from test to test and iterating on the positives will improve the estimate.”
                If you look back on my previous comment, I *specifically* designed the example I offered to show that’s likely true.

                1. >>It’s a starting point when you have no other data, but that doesn’t mean the assumption is more likely to be true.<<

                  Exactly. I suppose the test producer would have knowledge of the mechanics of the test, what conditions might cause errors, etc. So by you, I suppose I meant the creator of the test has the burden to describe those errors in some way.

                  But let's not split hairs, count the number of angels, etc. There will be dependencies but there will be randomness. Might as well retest the positives once to reduce the random errors especially in populations with low prevalence.

            2. That is why I said that tests musts be independent — I’m careful with my words. (See Original post above) You either are not a statistician or you did not read my post very carefully.

              But that said, the easiest way to see how independent repeated tests are, is is to retest known negative samples and Look at the correlation of false positives.

              1. Yes, you said “independent” in your first post. But then someone replied and said you could just improve things by retesting (no qualifier) and you seemed enthusiastic about such an approach.
                I was just pointing out that your assumption was *unwarranted*, not that you hadn’t ever acknowledged it in any way. (Yes, I can read carefully. Can you?)

                Note that I’m not just replying to you here. I finally discovered an article on Reason that seems to acknowledge the problems with testing in a somewhat technical way, so I’m trying to post things that will INFORM OTHERS. I’m not criticizing you, in particular — I’m just noting that there aren’t necessarily easy solutions here. If the approach you suggested would work so easily, studies would do it more frequently. The problem is that the errors in antibody tests aren’t generally completely random and independent.

                1. 1. you are not replying to me, but you clicked reply. Hm? Underhanded much?

                  2. You are speculating about the statistical nature of the errors, as someone else pointed out. You have not made any supported claims. If you really wanted to “inform other”, as you so patronizingly state IN CAPITAL LETTERS, you would also suggest a way for quantifying the random nature of the Errors. I did that, btw.

                  3. And finally, you made a logical error, when you stated that if what I’m suggesting would work so easily, others would have done it already. If we follow your thinking to it’s stupid conclusion, any new suggestion would not be worth trying. And you likely see yourself to be an intellectual? Well. You are not, you are a poser. And yes, I am criticizing you.

                  1. “Those who cannot read shouldn’t try to argue.” –Perhaps a Famous Person (but actually a quote I just made up)

                    1. “I’m not *just* replying to you here.” Actual quote from my last post. Didn’t say I wasn’t at all replying to you. Trying learning to read.

                    2. “You have not made any supported claims.” I quoted an actual sentence from the article right here. Again, try learning to read. The problems with antibody tests are also well-known. If you really want to argue about citations, try providing a source that

                    3. Typical internet poster BS. “I thought of this *obvious* idea while hanging out in my mom’s basement! Obviously all those experts never would have considered it!” Sorry to burst your bubble, but most of the time I see such a post, the experts *have* already thought of it, and there are reasons your “obvious” idea either won’t work or wouldn’t be as effective as you think.

                    Oh, and try actually reading my posts again. I’m repeatedly argued in favor of more data. I haven’t said your idea would be completing ineffective, only that it would likely be limited in being able to narrow down the numbers precisely. I even gave a detailed scenario demonstrating said fact.

                    But oh, you just want to argue and be a bit of a jerk, rather than REASON.

            3. On a different note, I don’t understand the aversion to encouraging empirical results in the case of COVID. It is almost like the medical establishment is saddened by the fact that the fatality rate is less than the initial WHO estimate of 3.4%. Why?

              1. Not sure if this was meant to be in some way a reply to me or not, but just to be clear, I am strongly in favor of as much data being reported as possible. But all of that data needs proper analysis and context. The failure to provide proper stats or proper analysis in some of these recent antibody studies is borderline professional misconduct, given how these studies were then trumpeted to the media where they could unduly influence public policy.
                I sincerely hope the estimates in many of these studies are correct and that the incidence is much higher than previously suspected (and the mortality rate, etc. much lower). That would be wonderful news! But I also think we should not let our biases impede our ability to look at data objectively.

                For example, John Ioannidis was a coauthor on the Stanford studies and had led a 15-year crusade against junk science. (He’s been a bit of a hero to me for years.) He criticized early COVID-19 models in the media as poorly thought-out, without proper data to back up the conclusions, but in the process, he clearly made it seem he was biased to believe the virus was overhyped. Only a couple weeks later, he stamps his name on a paper was (particularly in its original form) was basically the junk science he has crusaded against, not properly explaining statistical methodology. And then he goes out in the media again to claim it has more statistical power than it does.

                This is all profoundly disappointing to me. I don’t know what the “medical establishment” wants, but I’d love to see the fatality rate be much lower. I also think we need to interpret data carefully as it comes in to understand how much it says and doesn’t say yet.

                1. 1. Have you actually read the first version of the Santa Clara study? I suspect you did not, because the statistical methodology was explained in an online appendix, together with the caveat about false positives. we can quibble about the methodology in the study, but it was all there in the open.

                  2. Since when are research mistakes “professional misconduct”?

                  3. The ICL model predicted 2.2 million dead in the US in a most speculative fashion. Did you protest that as well? Was that professional misconduct in your view?

                  1. Burn.

                    1. Yeah, dude. “Burn.” Read my reply. Shall we use the actual studies quoted and actual professional stats analysis/commentary? Or shall we cherry-pick inaccurate and incomplete internet talking points?

                  2. 1. I suspect *you* haven’t read the first version of the study and are instead quoting talking points you’ve read elsewhere.

                    Want an actual statistician’s professional opinion on that study?


                    (And he’s posted more updates later too; I’d recommend reading all of them.)

                    2. The Stanford researchers are smarter than this. They wanted to rush this out, and get it trumpeted in the media as soon as possible. As I mentioned, Ioannidis was a coauthor (admittedly listed near the end, but he clearly put his name on it and promoted it in the media). Again, he has spent a career ferreting out just these types of misleading statements, errors, and junk statistics. Why is it that you want so much to give him a pass? He clearly had an agenda based on his preconceptions of what the results should say, and he promoted junk numbers. He’s spent decades calling others on their BS — why not call him on his?

                    Oh, and Gelman’s reaction to this study (in the link above)? He says the authors of the paper “owe us all an apology” because these were “avoidable screw-ups.” There’s a leading statistician calling the authors out.

                    3. The ICL report did no such thing. Once again, the inability to read problem seems to be coming up. (Or maybe you’re just being affected by partisan rhetoric, rather than REASON.) In a worst-case scenario, assuming no mitigation efforts (the UK official government strategy at the time), the ICL model predicted up to 550,000 deaths in the UK. In the best-case scenario (based on lower estimates of R, given uncertainty at the time) incorporating maximum mitigation efforts and government interventions, the ICL model predicted merely 5,600 deaths.

                    Given that the UK already has over 25,000 deaths, even with pretty severe mitigation efforts, the results are within the predicted range of the ICL model. I’m not arguing that the model was perfect, by any means, but it made all of its assumptions clear, published all of the limited estimates for fatality rates, reproduction rates, etc. that were used in its model and the various assumptions that were made, etc. The original ICL report didn’t contain the detailed prediction numbers for various mitigation scenarios for the US, as it did for the UK, as its primary purpose was to model the UK and argue that the UK’s current policy was problematic. However, applying similar stats ratios to the US estimate of 2.2 million compared to the UK mitigation scenarios, the model clearly implied a range from ~20,000 up to 2.2 million deaths for the US. Again, we’re well past 20,000 deaths in the US already.

                    So no, I don’t think the ICL study rises to the level of “professional misconduct.” The researchers made absolutely clear that there was a huge range of potential outcomes based on different circumstances and the possible range of estimates for the inputs. That’s what you do when you’re doing a careful statistical analysis with limited data.

                    Do I blame the media for perhaps misrepresenting the ICL study? Absolutely. Do I blame the media for overhyping the Stanford study without proper context? Absolutely. And I agreed with Ioannidis somewhat in his earlier media pieces where he criticized projections like ICL for perhaps going overboard in hyping the large numbers.

                    But that doesn’t mean it’s then correct for Ioannidis to turn around and do crap science in return. It doesn’t mean that it’s okay for the Stanford folks to screw up their statistical analysis. It doesn’t mean it’s okay for the Stanford folks to withhold critical assumptions, data, and the details of their modelling to avoid professional evaluation of their results. The ICL reports are all archived online, along with all the citations for how they came up with their numbers, the assumptions of their model, and the huge range of possible outcomes, which we’re well within.

                    For someone who seems interested in data and data analysis, why don’t you want to hold antibody studies to the same level of professional standards?

          2. The higher the false positive rate the faster they improve from iteration to iteration.

  14. Sometimes raw data points to something as well. In the northern third of California -17 counties from Mendocino east to Sierra County on the border with Nevada, north to Oregon – there are 1.2 million people in an area about the size of Virginia. As of May 1 there have been a total of 227 cases of coronavirus, with 9 deaths.
    Anyone going about in this area can attest to the observation that at most one half of the residents are observing social distancing and/or wearing facial coverings.
    The chance of getting coronavirus with symptoms bad enough to call for a test is 1 in 5,280. The chance of dying from coronavirus appears to be about 0.0008%.
    Maybe the disease is not nearly so deadly as the news media have been portraying it to be. Except in NewYork City, the center of the universe.


    Oh, wait. Turns out,

    it’s complicated.

    Fucking Einsteins.

  16. Just as big changes in a really small number are still really small numbers — small changes in a really big number is still a really big number.

    Knowing the true number of infections is in the very short term, less important than knowing that the true number is in fact substantially higher.

    Whether the testing shows 10 million or 30 million, either number suggests that so long as local resources are not strained most people can resume most normal activities, just be mindful of hygiene and don’t spend a lot of time indoors with grandma. Bring her outside

  17. I kept waiting for the part of the article where some intellectual curiosity would be displayed re: the question of why there is such a variance in the quality of COVID tests. The answer is a lack of government regulation.

    1. Indeed, with some heavy government regulation, all of the tests would be equally worthless.

    2. It doesn’t have to be the government. It could be some sort of voluntary association. But, yeah, given that the government is going to be paying – directly or indirectly – for much of the testing , then in this case government standardization does not seem unacceptable.

      The Bureau of Weights and Measures being just the sort of thing that divides the libertarian from the anarchist.

  18. Stay home and visit dicke weiber for relaxing chat

  19. Lily R. Anderson Single mom makes $89844/yr in her spare time on computer without selling or buying any thing. I got inspired and start work now i am making $175 per hour. Its to easy to do this, every one can do this no experience or skill required just join the given link and start earning from very first day. Here is link… More Read Here

  20. Sarah Y. James paycheck was for 1500 dollars… All i did was simple online work from comfort at home for 3-4 hours/day that I got from this agency I discovered over the internet and they paid me for it 95 bucks every hour… Read More

Comments are closed.