99% Disappointing

When I first heard there was an antibody (serum) test I thought wow, this is fantastic!  If you are certified to have already had it, then you know that it’s safe for you to be around others and others can be confident that they are safe around you. It could be like a license to go to work.virus_antibody_illustration

Then I thought about it.  Actually, the test is probably useless for you, personally. (It has other uses, like making policy, but that’s not what we’re talking about here.) The problem has nothing to do with not knowing whether Covid-19 guarantees future immunity. You don’t need to go there in order to show that it’s useless for the average person.

This isn’t an Internet crackpot thing—it’s real math you can verify yourself.  It’s a disappointment but the reasons are interesting and the principle applies to all tests that yield a positive/negative result. The smaller the proportion of people in the population that have the condition in question, the more this principle applies.

I’m just going to explain one small aspect of this. One of the main places this applies is in diagnosing illnesses and that water gets very deep. Still, it’s interesting to poke around in it and it might help you understand what your doctor is doing someday.

What Does 99% Accurate Mean?

For at least two intertwined reasons, it doesn’t mean anything to say that a test is x% accurate. Any claim offering a single percentage value is just muddying the water. The first reason is that tests like this have two different kinds of accuracy: the probability that a positive result is true and the probability that a negative result is true. They mean completely different things and the two probabilities are very often different for the same test.

The paradox is that even if a test is 99% accurate with respect to positive results, the probability that a random person’s positive result is false can still be very high. In fact, in many realistic scenarios with Covid-19, a positive result is more likely to be false than true.

That may seem to be self-contradictory, but it’s not. Unlike normal humans, statisticians and lawyers say precisely what they mean. Most of the rest of us just aren’t used to it.   You have to understand a couple of things before it is clear what 99% accurate means.

Sensitivity v Specificity

The first kind of accuracy is called “sensitivity.” On average, when you test 100 people who are known to have a certain condition using a test with a sensitivity of 99%, 99 of them will test positive for it. That’s what it means. It sounds simple but the catch is that a sensitive test can also flag any number of additional people who really don’t have it.  A sensitive test rarely gives a false negative result but it is allowed to give false positives galore.

The second kind of accuracy is “specificity.”  If a test has a specificity of 99%, on average, every time you test 100 people who definitely do not have the disease, no more than 1% of them will test positive. The catch is, the test can give any percentage of false negatives.

So what you ideally want is a test that scores high both ways.  Sensitivity insures that a test is positive for most people that have a condition and specificity rules out most of the false positives that sensitivity permits for people who don’t really have it.

Sensitivity and specificity aren’t all or nothing, of course. A test has a sensitivity of X% and a specificity of Y% and it is not unusual for a test to be strong on one and weak on the other.  Tests that are weak in one or the other still have uses. For instance, if a doctor is trying to diagnose an illness, a test with high sensitivity and low specificity can rule out a condition even though it can’t reliably say that a person has it. A test with low sensitivity but high specificity can tell you the patient very likely has the disease because it’s good about false positives but it can’t tell you that they don’t have it because it’s allowed to show false negatives. I.e., you can rule things in but not out.

The Covid-19 antibody tests are actually pretty good both ways. The one called the Celex test has a sensitivity of 94% and a specificity of 96%.

That’s a Relief!

So if you give 1000 people who had Covid-19 the Celex test, 940 of them will test positive. And if you give 1000 people who’ve never had Covid-19 the Celex test, 960 of them will test negative.  I thought you said the tests were useless?

I’m sticking with that opinion. Read on.

The Catch

It’s not a paradox, just simple math. Let’s take an exaggerated case to make the reason for it clear.

Imagine a hideous disease that only one person in a million has. You’re a nervous hypochondriac with no symptoms of the disease but you have insomnia from worrying about it. Fortunately there’s a test with 99% sensitivity and 99% specificity so you decide to get tested so you can rest easy.

Unfortunately, the test comes back positive. Is it time to join the Hemlock Society Web Forum?

Not at all. Even if you get a positive result it’s hardly worth worrying about.  It’s like this.

It’s a one in a million disease and there are 325,000,000 people in the USA, so 325 people actually have it.  Now, imagine that you tested all 325 million Americans. The test is 99% specific so only 1% of people get false positives but that’s still 3,249,625 false positives plus 322 true positives (3 of the 325 people who really had it got false negative results.) Call it 3.25 million positive results.

The number who actually have it, 325, divided by the number of people who come back positive, 3,250,00, is 0.0001. One in ten thousand positive results is actually true. That’s worse than scratch-off card odds.

In other words, even with a positive test result, the chance that you have the disease is about the same as the chance that you’ll die in a car wreck this year.  So be sure to only take the subway this year and you should be fine. (That’s a statistics joke.)

The same principle works for false negatives but you get a very different result because out of 325 million tests only 325 people even have a chance of a false negative and only 1% of them will get one. 1% of people who have the disease will get a false negative but for a random person, the chance that a negative result is wrong is a million, which is lottery odds.

Qualifying the Percentage

Statistics people will usually specify both the sensitivity and the specificity.  To give a single number for “accuracy” you would have to say which one you mean (false positive or false negative) in the context of some specific density of cases in the population. 

You will see this in the chart below.

Getting Specific to Covid-19

Here’s a chart from an letter published in American Family Physician  specifically about the probability of false positives that you’d see from the actual Cellex antibody test for Covid-19 given a wide range of hypothetical densities of people who have actually had Covid-19. The author worked these out, but you can reproduce similar numbers yourself using an Excel spreadsheet for any given sensitivity, specificity, and range of population densities.

Nobody knows what the real number of people in the US who’ve had Covid-19 is. The number of confirmed cases as of today (July 17 2020) is around 3.5 million, which is a little more than 1% but the real number is surely higher. Let’s say the real number is more like 5%, i.e., out of every five that got it, only one was confirmed.  It’s very hard to get a straight answer about this ratio from any source, BTW. I’ve seen estimates from 2X to 15X. This recent Washington Post article cites the head of the CDC estimating 10x for the USA in June, but testing for active Covid-19 has ramped up rapidly lately so that estimate could be unrealistically high by now.

Whatever the number, even if it is accurate, it is unlikely to be correct for your particular town of area for reasons I’ll explain below.

Looking at the chart below, if the multiplier is 5X, which gives a density of about 5% in the population, so if a randomly selected person takes the test and gets a positive result for antibodies, the result will be wrong 44% of the time (because the false positive rate for the 95% of people we are assuming have not had Covid-19.)

Even with the hypothetical better test that yields only 1% false positives, if only 5% of the population actually had Covid-10 it, a positive test would still be wrong 17.4% of the time.

If you think the CDC Director’s estimate that I mentioned above is more trustworthy, use 10X, which still yields 27.7% false positives and 9.1% false positives, respectively for the two tests.  It’s not impressively accurate either way and the second test doesn’t really exist.

covid-antibody

It’s Worse Than It Sounds

Whatever the real density of cases in the population of the USA is, it would tend to overestimate the case density for the majority of towns and cities because the hot spots are accounting for disproportionately many of the USA’s cases.  That’s why the letter emphasizes local conditions.

For instance, New York City, one of the hardest hit places in the country, makes up about 2.8% of the USA population but had about 225,000 confirmed cases or 6.25% of cases to date. It used to be a higher percentage but new cases are way down in NYC and up elsewhere. There are eight million New Yorkers, so about 2.8% of New Yorkers were confirmed to have had it at some point. If the real number is five times the number of confirmed cases then about 14% of New Yorkers have had it.

Crudely interpolating from the chart we see that under the 5X assumption, somewhere between 20% and 60% of positive antibody tests will be false.  If the real multiple is 10x, then 28% of New Yorkers have had it (which I find implausible) and you get 9% false positives.

The reason that 10X is probably too high a multiple for NYC is that NY one of the states with the lowest percentage of positive results for tests for the illness (1.4%). NYC does a lot of testing. This tells you that NYC is almost certainly confirming a higher than average percentage of the actual cases and therefore the confirmed case count and actual case count will be closer together. In contrast, Arizona has about 21% positives, so by the same reasoning, they are probably failing to confirm a higher than average percentage.

So What Does This Mean Practically?

The letter from which the chart is taken is typical of the blandness of the commentary one reads on this subject. It makes great sense to advise doctors to consider local prevalence of Covid-19 in order to estimate the probability of a false positive before sending people into harm’s way.  It seems almost perverse, however, not to point out that there is currently nowhere in the entire country where the density of cases is high enough to make a positive result a safe bet.

It seems to be an academic/professional convention to say things in the blandest possible way and leave the reader to draw the right conclusion. I get that, but in this case, the reason for publishing the letter is that the prior expectation is that the reader will have failed to draw the first conclusion on their own.

It is equally odd to not mention that even if the density were high enough, we’d have no way of knowing because we don’t do enough random testing to generate the hypothetical locally specific estimates the use which is central to the author’s advice.

Conclusions

False negatives are a nuisance. They mean people are unnecessarily kept home and isolated.  It’s an inconvenience. A false positive, on the other hand, can get you killed.

If you’re trying to decide whether it’s safe to go back to work based on an antibody test, the short answer for everywhere in the USA is currently no.  There isn’t a high enough density of cases anywhere to give a false positive rate a reasonable person would rely on.  The only exception I can think of might be a few special cases such as prisons or nursing homes where you have a captive group with a known lower bound on the number of cases because there was a huge amount of testing per-capita.  And that’s pure speculation not something to rely on.

If you want to actually estimate the false-positive rate for your own town, find out what the density of confirmed cases is in your area and check the chart.  That will give you the worst case because it’s the lowest the case density can possibly be. Get the best case by multiplying that rate by your best estimate of the ratio of actual cases to confirmed cases. Then you can either calculate it yourself or simply interpolate from the values in the chart.

How do you pick a multiplier? It’s at best an informed guess, even if you’re a doctor or in public health. The CDC Director’s number, 10X, is for the whole country so let’s assume that’s the real number.  Multipliers for specific areas within the USA will be clustered around that number, some higher and some lower. If your area has a high percentage of positive test results, the number for the whole country will probably be on the low side for your area.  If your area has a low percentage of positive test results the national estimate will probably be on the high side because you’re testing more thoroughly. Only in a few cases will it match the national average. In a lot of the country it’s going to be really small, like under 1%.

As a belt-and-suspenders type of guy, I’d go conservative and use 5x, until and unless the CDC comes out with geographically fine-grained estimates at a county or zip-code level. It’s only prudent given that you are gambling with your life in a casino where at least half the population lives in the areas with a lower multiplier than the national number.  If it’s rare where you live, forget the whole project because almost all positives will be false.

The Bottom Line

Even under the most favorable assumptions, even the hottest hot spots weren’t hot enough long enough to yield a false positive rate that a reasonable person would want to bet their life on. Unless the country decides to go full-Sweden and let it rip, we’re not going to reach those levels anytime soon. It’s disappointing but we need either a lot more cases or a much more accurate test combined with county or zip-code level estimates before antibody tests could be reasonably reliable for personal use.  Get one if it makes you feel better but don’t change your behavior based on a positive test.

Disclaimer

In case it’s not clear, this is not a claim that antibody tests are useless, just that they are useless for computing whether a particular person is at risk. They are very useful to statisticians. Statisticians compute the sensitivity and specificity rates against groups known to have or not have had it. Given those exact values they can then discount the the crude rate of positive results that is obtained from random serum testing to arrive at an estimate of the real rate.  That’s the key estimate we need for all kinds of reasons including assessing the probability of false positives.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s