The Origin of Group Testing

The invention of group testing dates back to WWII, as the U.S. army wanted to weed out all syphilitic men called up for enlistment. The presence of syphilis could be identified with a blood test. However, individually testing the blood sample of each new recruit costed lots of time and resources. Thus, the U.S. army had to face the following problem:

Is it possible to test N people with less than N tests?

The statistician Robert Dorfman had a brilliant idea to minimize the number of blood tests required. He proposed that:

The blood samples of k people can be pooled and analyzed together. If the test is negative, this one test suffices for the group of k people. Instead, if the test is positive, each of the k persons must be tested again separately, and in total k+1 tests are required for the group of k people.

The principle of group testing is summarized in the figure below.

Visual comparison between Individual Testing and Group Testing. In Individual Testing (left), each person is tested independently and receives an independent outcome. This strategy requires as many tests as there are people. In Group Testing (right), the samples of a group of people are pooled together and analyzed with one single test. If the group result is negative, all the individuals of the group can be considered negative. Instead, if the group result is positive, an additional test is needed for each person of that group. Group Testing can potentially save lots of tests and time. Image by author.

What’s the performance of group testing? Dorfman showed that it depends on the probability p that a recruit is infected and, importantly, on the choice of the group size k.

Here, we will reproduce Dorfman’s computations in order to evaluate the performance of group testing. The following steps are needed:

  1. Compute the group-outcome probabilities
  2. Calculate the total number of tests required
  3. Optimize the group size k, for a given prevalence rate p

Note: the practical effectiveness of group testing depends also on the sensitivity of the test, which is specific for the considered illness. In the following, we will assume that test outcomes are always reliable.

1. Group Probabilities

Let p denote the probability that a single person is infected. This corresponds to the prevalence rate of the considered disease in the population (it was syphilis for Dofrman’s analysis, or Covid today…). Thus, p is the independent variable of the problem and is assumed to be known. It is convenient to define q=1-p as the probability that a person is healthy.

The pooled test for a group of k people will be negative if all of them are healthy. We find the probability P(-) of this event as:

Probability of a negative outcome for a pooled test of k people. Image by author.

Conversely, the test outcome of the group is positive if at least one person is infected. The probability P(+) of this scenario is complementary to P(-):

Probability of a positive outcome for a pooled test of k people. Image by author.

We can think of the test outcome of each group as a coin toss, which decides whether the group is healthy or needs to undergo the individual test round. More precisely, we can say that the outcome is a Bernoulli random variable with “success probability” equal to P(+).

2. Expected Number of Tests

Given a total of N recruits, we can now compute the number of tests required within the group strategy. This number is random, since it is subject to the probabilistic outcomes of all groups (see footnote¹). Nevertheless, we can study its expected value.

First of all, each of the N/k groups receives one initial test. Next, whenever a group gets a positive outcome, k additional tests have to be performed. Overall, the expected number of tests, E[T], reads

Computation of the expected number of tests within the group testing strategy. Image by author.

We see that the result is proportional to N, the total number of recruits. Thus, the term f(k,p) can be interpreted as the expected number of tests per person: the initial test counts as 1/k because is shared with the group, and then an individual test occurs with probability P(+).

Whenever f(k,p) is less than 1, group testing is more efficient than performing Nindividual tests. The following plot illustrates that, for each prevalence rate p, there is a group size k* for which the number of tests is minimized.

Plot of the expected number of tests per person as a function of the group size k for different prevalence rates p. The black arrows indicate the optimal group size k* that minimizes the average number of tests for each choice of p. Image by author.

By following the trend of the arrows in the plot, we can appreciate that

the rarer the prevalence of the disease, the larger the optimal group size gets.

This makes sense because, with lower p, it will be more likely that larger groups will get a negative result: so, we can save on tests by pooling more people together.

Next, our goal is to derive a formula to determine the optimal group size k*.

3. Optimal Group Size

For a given prevalence rate p, the optimal group size k* is found by minimizing f(k,p), the expected number of tests per person. Therefore, we set the partial derivative of f(k,p) with respect to k equal to zero:

Derivation of the equation for the optimal group size. Image by author.

Unfortunately, this equation does not admit an explicit solution. However, since the value of p is typically small, we can proceed with a Taylor expansion around p=0 as follows

Approximation of the optimal group size k* for small p. Image by author.

The obtained approximation is of great practicality. For small p,

the optimal group size k* is well approximated by the inverse of the square root of p.

For instance, we find k*=5 for prevalence rate 5%k*=10 for p=1%, and k*=32 for p=0.1%.

4. Employing Optimal Group Testing

We can explicitly check how many tests E[T*] are required on average with the optimal group size by substituting the approximated formula for k* into E[T]. By Taylor expanding again for small p, we get:

Approximation of the optimal expected number of tests. Image by author.

Whenever p