What the DP-3T Initiative Means for Privacy

The world is in the grip of a pandemic that has shut down the economies of all countries, imposed restrictions on freedom of movement, and more importantly is leading to the deaths of thousands of people. The problem is that the virus can be contagious in the weeks before any infected person shows symptoms of the disease. A standard way to deal with epidemics and pandemics of this form is so-called contact-tracing.

The traditional method of contact tracing is to interview each infected person, find out who they have spent time with in the previous one to two weeks, and then inform those people; so they can either be tested if tests are available or so they can self-quarantine. The problem with the traditional paper-based methods is they do not scale well to the levels of infection we are seeing, and in addition, they cannot capture the contacts one has in modern life; for example, sitting next to someone on a bus or train on your daily commute.

This has led some countries to discuss, and in some cases already deploy, so called-contact tracing Apps on people’s smartphones. These Apps are controversial both from a medical and a privacy standpoint. On the medical side, there is still debate on whether they will be effective unless 60% of the population have the App on, and in addition, it is unclear medically what precisely constitutes a “contact”.

On the privacy side, we need to consider two angles. Firstly privacy for the infected person, and secondly privacy for the non-infected person. In the paper-based method the infected person becomes known to the authority and then the places they have visited are disclosed and the people they have contact with are informed. Thus in the paper-based method, the non-infected people are only contacted if they have been in the proximity of an infected person. In particular, the non-infected peoples’ movements and meetings are not recorded.

Almost all the contact tracing methodologies make use of Bluetooth on modern phones. The basic idea is that each phone sends out small Bluetooth packets consisting of a randomized “identifier”. Each phone then records which identifiers it has “seen” over a two week period. The notion of “seen” can be modified to deal with medical knowledge (e.g. the phone might only record identifiers seen over a two minute period, or with a strong signal denoting proximity).

The problem with the traditional paper-based methods is they do not scale well to the levels of infection we are seeing, and in addition, they cannot capture the contacts one has in modern life.

The key issue is what happens when someone tests positive. Here there are two solutions; which have loosely been categorized as “centralized” and “decentralized”. These words qualify where and how crucial operations for the system happens, namely where “identifiers” are generated, and where the decision as to whether inform a user that they are at risk is taken.

Centralized Model: Here the identifiers are created by a central server, who sends them to the phones. Thus, the server can choose which identifiers to give to particular phones. In this scheme, upon infection, the identifiers that have been seen by the infected person’s phone are loaded up into a database, and the central authority has a way of reversing the identifier so as to obtain the actual person. They can then contact the person who was in contact with the infected person.

Decentralized Model: Here the identifiers are created on each user’s phone. Then, only the identifiers sent by the infected person’s phone are loaded into the central database. The non-infected users occasionally read from this central database. They can see locally on their phone whether they have been in contact with an infected user, and can then take appropriate action depending on the prevalent medical advice. But no one can associated identifiers with actual people.

In the centralized model, the central authority learns who the infected person has met, and can also extract the movements of the non-infected people. Thus the privacy of the non-infected people is seriously compromised. In the decentralized system, the central authority is simply storing completely random identifiers that cannot be reversed into actual people’s identities. Thus in the decentralized system, only the privacy of the infected user is compromised — but this is in the same way as happens with the paper-based traditional method. Thus from a privacy perspective, it seems the decentralized option is best.

The decentralized model is also simpler to implement from a Bluetooth perspective. Singapore has already released such an App (in the centralized model) which only got a penetration of about 20%; recall they need more. So why was the penetration so low? Well, a problem is that these Bluetooth signals are a known privacy problem and so phone manufacturers limit the usage Apps can make of them. The result is that the Singapore App on iPhones needs the iPhone to be permanently unlocked. And who wants to go around with an unlocked phone in their pocket?

To make contact tracing solutions based on Bluetooth practical, phone manufacturers will have to cooperate and loosen some of the restrictions that they implemented to protect users’ privacy against Apps. How can this be done while maximally preserving privacy?

Google and Apple (being the major operating system providers of the world’s smartphones) have teamed up to support the decentralized systems. In the next month, a change will be rolled out to enable phones to broadcast these randomized identifiers without needing them to be left unlocked. Crucially Google and Apples change will output random identifiers, i.e. ones which cannot be reversed into a phone identity, thus the change will only work with the decentralized solution. This reduces privacy loss because if they used identifiers which could be reversed (as the centralized solution) requires, this would result in the potential misuse of the facility to mass-surveillance by any country that wanted to force its citizens to install the App.

This debate of centralized vs decentralized has turned highly political in the past couple of weeks. I helped organize a letter that was signed by over 300 scientists from over 26 countries[1]. The letter calls for centralized systems to be abandoned as a serious privacy risk.

The letter calls for four principals to be adopted:

  1. Contact tracing Apps must only be used to support public health measures for the containment of COVID-19. The system must not be capable of collecting, processing, or transmitting any more data than what is necessary to achieve this purpose.
  2. Any considered solution must be fully transparent. The protocols and their implementations, including any sub-components provided by companies, must be available for public analysis. The processed data and if, how, where, and for how long they are stored must be documented unambiguously. Such data collected should be minimal for the given purpose.
  3. When multiple possible options to implement a certain component or functionality of the app exist, then the most privacy-preserving option must be chosen.  Deviations from this principle are only permissible if this is necessary to achieve the purpose of the app more effectively, and must be clearly justified with sunset provisions.
  4. The use of contact tracing Apps and the systems that support them must be voluntary, used with the explicit consent of the user and the systems must be designed to be able to be switched off, and all data deleted, when the current crisis is over.

Since the publication of the letter, Austria, Estonia, and Switzerland have already announced they will only adopt a de-centralized approach (in particular the DP-3T protocol developed by a team of which I am a member). In Germany, they dramatically switched from a centralized to a de-centralized model on 26th April. In both Italy and Singapore it also looks like that they will now switch to a decentralized approach. In addition, the DP-3T consortium is in discussion with a number of other countries around the world.