What are the arguments over ebola screening really about? Here’s my go at explaining, with a bit of help from the Reverend Bayes.
There’s a lot of fear and uncertainty about the potential spread of ebola at the moment, and some countries have started screening new arrivals at airports and other ports of entry for signs of ebola.
There are several ways of running a screening programme of this type, and arguments persist about whether screening is appropriate or works. Some of the arguments are actually quite subtle and difficult to convey in the short time available on a news channel. Here’s my attempt to discuss things in a little more detail.
Bayesian statistics can help us understand screening and, importantly, can help us break down the arguments so that we can understand when it is worthwhile and how it can best be done.
When a person is screened for ebola, one of four situations can arise:
|E||Has ebola (H)||Does not have ebola (not H)|
|Positive screen result (E)||True positive||False positive|
|Negative screen result (not E)||False negative||True negative|
What Bayes’ theorem can do is help us to understand the relationship between screen results and ebola status. There’s a detailed explanation of Bayes’ theorem in the glossary, but for now we just need the equation:
- is the probability that you have ebola given a positive screen
- is the probability that you get a positive screen given that you have ebola. This is known as the sensitivity of the test
- is the probability of getting a positive screen regardless of whether you have ebola or not
is a bit of a complex concept, but it can be broken down to make it more manageable:
This is simply the probability of getting a positive screen if you have ebola plus the probability of getting a positive screen if you don’t have ebola, each of which scaled by the chance that you have, or do not have, ebola.
In equation 2, is the chance of getting a positive screen if you do not have ebola. This is related to the specificity of a test.
Given some basic information about the ebola test, we can start to populate our table. Specifically, we need to the sensitivity of the test and its specificity.
Now, I don’t know the sensitivity and specificity of current ebola tests, but the CDC appears to suggest that real-time PCR methods are used. This paper, which is quite old so may not reflect current practice, suggests a sensitivity of between 91% and 100% and a specificity of around 97% for real-time PCR. Therefore, without further information, my guess would be that a full laboratory test is about 95% sensitive and 97% specific to ebola.
We can use this to populate some equations:
- is the sensitivity, which we believe is about
- is the specificity, which is about
- is one minus the specificity, which is
- is one minus the sensitivity, which is
- is the prior probability of having ebola. In this case, that means the probability of having ebola before we know the test results. Obviously, this is going to vary between people, so I’ll leave it blank for now
- is , because a person must either have, or not have, ebola
The important thing now is that we have enough data to calculate the cells in our table. Furthermore, we can do this for a whole range of values of . That means, we can look at the value of ebola tests for people with a whole range of baseline risks. We can also look at the data two ways:
- From the patient perspective – For example, is the probability that you have ebola given a positive test. We can look at all the data this way to explain to people going through the test process why they should, or should not, rely heavily on the test
- From a public health perspectve – For example, if we look at , we can see how useful our test is at catching cases from high risk areas
We will start with the patient perspective.
The graphs below show how confident we should be in a positive (graph 1) and negative (graph 2) diagnostic test. The x-axis is , which is their baseline risk. In other words, this is the risk they had of ebola prior to us performing the test.
That’s the value of testing from the perspective of someone who has, or hasn’t tested positive. The public health perspective is to ask how many people would be in each of the four quadrants of our table if we tested everyone. Again, this depends very heavily on :
If you look at these graphs, you will see that the value of testing is highly dependent on . The tests themselves are actually pretty good but, if you start using them on everyone, you will get a lot of false positives. In addition, when the risk of disease is very high, even low error rates result in cases being missed, with potentially serious consequences.
The graphs above immediately tell us two things:
- Do not rely on testing alone – In high-risk environments, quarantine is necessary to prevent disease spread. However, most environments are not high-risk.
- Testing should not be used in very low-risk groups – The use of resource in testing will be huge for a very small benefit, and it may provide an illusion of protection.
Number 1. causes issues with healthcare staff. Essentially, maintaining frontline staffing is a huge challenge because it is difficult to be sure that someone in a high-risk environment doesn’t have disease. This means providing external frontline staff creates a risk of disease spread. For this reason (and others), a lot of international support is based around protecting and supporting healthcare professionals rather than providing frontline care.
Number 2. means that pre-screening is necessary. This is the reason that all screening programmes include more than just testing. They include questionnaires about exposure (e.g. ‘have you been in contact with anyone with ebola?’) and, sometimes, external signs of illness (e.g. raised body temperature). What pre-screening does is to identify people at increased baseline risk prior to more formal testing.
The main arguments about screening arise because of the sensitivity of testing to baseline risk. The argument is essentially whether the pre-screens used can identify people with a high baseline risk. There are several drivers of this.
One of the most important, and one that varies between different disease outbreaks, is the real risk of people flying into a country. When there is an outbreak of a disease like ebola, affected countries (or, at least, airlines using them) will check people boarding flights for signs of illness. The question is, what is the real risk of someone who isn’t showing signs on boarding developing sufficient signs for them to be picked up during the pre-screen once they land. This argument is complicated by things like whether people will try and hide illness to escape a high-risk area, any corruption amongst the people doing screening at boarding etc.
The issue of pre-screening is hugely important, because we have already seen that false positives and false negatives exist. False negatives during pre-screening mean a person never gets a proper test. Now, asking people a few questions and/or taking their temperature is probably not very sensitive (ebola has an incubation period with no raised temperature) or specific (all sorts of people get sick for all sorts of reasons. Let’s say the sensitivity and specificity of this pre-screen are both 50%. What does this do to our chart? This is what it does:
This could be quite alarming. After all, it suggests that quite a few people could be missed. The point is that this stage is really sensitive to how the pre-screen is done. Do we accept a lot of people being inconvenienced (i.e. getting a false positive), and a consequent slow-down in international movement and huge cost to catch as many cases as possible (while still missing some)? Or do we use the pre-screen as an opportunity to inform higher-risk people of the signs of illness? Actually, this second approach can be important, if it means telling people leaving a high-risk area that reporting sickness is okay and will result in them getting good medical care.
The point is that the decisions involved in screening are complex. There is, undoubtedly, political pressure to do something, but there is logic behind some of the ineffective-sounding screening activities (questionnaires, temperature-testing etc.).
It’s all about the baseline risk, and that is very difficult to estimate. If you look at the shape of the graphs above, you will see that slight differences in that estimate may produce very different decisions about the value of screening.
That is why there are arguments and it is why those arguments are legitimate. Unfortunately, there is no simple answer. What Bayesian approaches can help us do is to agree a structure for those arguments so that, in difficult situations, we focus on the factors that should drive our decisions. While screening is a specialist area, by breaking the problem down into its component parts, it is possible for a non-specialist to engage in the discussion. That is surely better than relying on our natural trust (!) in official statements.
These things can get pretty complex quickly. That means I’ve probably got something wrong – what is it? Let me know below.