An introduction to health data encryption

Modern health services are digital, provided by web applications that interact with a cloud infrastructure. The cloud enables to adapt quickly and grow fast in different countries, thus serving more practitioners and patients. Hosting such a large volume of sensitive medical data comes with security and confidentiality risks.

A security team's mission is to ensure the company implements the latest innovations to mitigate those risks. Our responsibility is to get employees to ask “Where are we most vulnerable to attack?” and “What do we need to do to safeguard against these threats?” After outlining the challenges of protecting health data, we discuss our approach to determining our encryption strategy and then illustrate it with end-to-end encryption.

How to protect sensitive data?

The primary security objective is to protect users by securing their personal data. Personal data is useful for providing services to patients such as online appointment booking, video consultation or document sharing with doctors. We distinguish 3 data sensitivity levels: internal (or technical) data, Personally Identifiable Information (PII, e.g. name, email) and Personal Health Information (PHI), defined in the European Union General Data Protection Reglement (GDPR 2018): “data concerning health means personal data related to the physical or mental health of a natural person”. The information about a patient going to see a doctor is PHI. As for the documents, since they contain specific PHI and PII on a patient.

New interconnected software processes patient health records for practitioners. The diversity and increasing volume of sensitive data increases the impact of attacks, as a single software vulnerability can lead to a massive data leak. This encourages attackers to make a profit by blackmailing in exchange for money or power. Data extraction is usually the result of several vulnerabilities successfully exploited. Thus defending requires fixing as many vulnerabilities as possible whereas attackers just have to find a few to exploit. It follows that effective security is more about radically deterring attackers than naively aiming for zero incidents. Defense is a race against the attackers to make their attacks unprofitable.

There are other threats when patients lose control of their personal data. Privacy is about the user deciding who has access to his personal data, for what purposes, and for how long, with the possibility to change their decisions at any time. With medical data, only the possibility of an employee having access to health information is a privacy breach. Therefore the company is among the potential threat actors along with the cloud provider and external attackers.

The regulation exists to make companies implement the latest technologies, called state-of-the-art, forcing attackers to change tactics to keep up. In practice, we prioritize our security investment because there are many opportunities. To efficiently prioritize, we align the company on the most impacting threats needing mitigation first. In a nutshell, our security team determines the most impactful and innovative protection techniques (called controls) to implement. The protection of sensitive data is a continuous iterative process to improve and verify the effectiveness of security.

How does encryption protect sensitive data?

The example of encryption illustrates the implementation of security controls in applications. The objective of a protection mechanism is to prevent unauthorized third parties (an external attacker or an employee) from reading or modifying sensitive data. Encryption is very effective in achieving security and privacy because it transforms sensitive data into non-secret random text that no longer makes sense. The reverse transformation requires a specific secret key which defines who can reverse the encryption and recover the sensitive data again. The science of protecting secret messages is called cryptography.

A common misconception is to consider data either fully encrypted or not protected where in fact encryption could never ensure 100% security: it is not an on/off switch! The level of security of an encryption scheme depends on how and when you encrypt and decrypt data and how you protect secret keys. For example, encrypting 100% of data before saving it on hard drives (at-rest encryption) protects against physical access risks. Or encrypting 100% of data transmitted over the network protects only from traffic interception. Additional server-side encryption techniques could further reduce data exposition in servers, but decryption is still necessary for processing data, at least in servers’ memory. Assessing the security of encryption models and residual risks is called threat modeling (here is a good example).

Moreover, transforming data into random data makes it more difficult to process it. For example, server-side encryption prevents the database server from performing search or filter tasks on encrypted data. An encryption technique could even block another useful security protection: protecting servers from Internet network attacks requires processing requests in clear for analysis. Sometimes it is possible to maintain data processing but with higher complexity and the cost of implementation would influence the prioritization. Thus in practice implementing an encryption technique is a trade-off between security, functionality, and cost. The more encryption you add to the data lifecycle, the more complex it is to provide the same level of services.

A decision to implement End-to-End Encryption (E2EE) results from lucid discussions about risks to mitigate, our security and privacy objectives, balanced with cost of implementation. It is the only encryption system designed so that the users, practitioners and patients, exclusively decrypt the data and no one else, including the company or third parties. Secret keys are owned by users, thus they remain in control over who else can decrypt their data. But by disabling decryption from the servers, it becomes impossible to process end-to-end encrypted data, which is essential for providing services. For cloud-based applications the cost of implementation is high, but the risk to mitigate is worth it when health data is involved. 

Even though you add up several encryption techniques, including powerful ones like E2EE, you have only solved half the problem. It remains to determine which encryption should be applied on which data (PII, PHI) and when (client-side, server-side), using a data classification methodology. At-rest encryption applies to internal data whereas end-to-end encryption is expected to secure PHI. However, end-to-end encryption can’t apply to all types of data because it is not relevant, too costly and it would simply shut down cloud services (example of Apple iCloud service).

Securing the data lifecycle by choosing the right techniques to apply at the right moment is called an encryption strategy. Not having 100% encrypted data all the time doesn’t mean it is not secured and there are always opportunities for improvement (in encryption like other protection techniques).

The challenge of implementing E2EE

The implementation of end-to-end encryption demonstrates the trade-off between security and user experience, as it brings new constraints necessary to secure their secret keys. Cloud-based services rely a lot on server-side processing: simple tasks such as selecting, filtering, searching on encrypted data are broken and must be redesigned. Thus the main challenge with E2EE is not much the cryptography itself but more how to implement the technology without killing core features or security. In contrast, the simplicity of instant messaging apps explains why they were the first services to provide E2EE (WhatsApp or Signal).

End-to-end encryption is a client-side encryption scheme, meaning data is encrypted and decrypted on the device of the user (smartphone or browser). Each user owns a unique cryptographic identity composed of a public key, shareable with others, and a private key that remains on the device. User’s keys are not used to encrypt data directly: each piece of data is encrypted by a dedicated key. It allows sharing pieces of data with a specific list of users. The distribution of millions of secret keys among users’ devices prevents an attacker from compromising a large volume of data at a time (he could download but can’t decrypt the data). In a nutshell, E2EE drastically reduces the impact of a data leak for cloud services. However, a small leak is still possible if the data is decrypted on a compromised device.

An inherent trade-off in E2EE is the risk of data loss. If a user loses his device (containing his private key), all his encrypted data may be lost. To enable users to change devices without losing their private key, we use their password to encrypt and save it in the cloud (the recovery key). However, there is still the risk of forgetting the password, which experience shows happens often. Thus we implemented a second mechanism to recover the private key based on double authentication to ensure practitioners will always be able to decrypt their documents, even 10 years after encryption.

Another trade-off is the use of user groups, essential for new users to gain access to the history of encrypted data. However, once you have shared a private group key to users, even if you rotate the key to remove them from the group, they would still be able to decrypt the history of encrypted data. The more flexible it is to access sensitive data, the easier it could be also for attackers. Therefore we balance the residual risk by implementing complementary measures such as server-side access control (attacker could decrypt, but can’t download data) and rate limiting (attacker can’t download much data in a short period of time).

To ensure that end-to-end encryption is compatible with the user experience, it is easier to use an end-to-end encryption technology as a service in the cloud. By using APIs from their browser, users register their cryptographic identity and share secret keys with others without fear of losing data. In this model the service provider is still in charge of hosting and serving the encrypted data; The E2EE service only performs the key management between users. It should be noted that the client application code is controlled by the service provider, so the confidence of this E2EE model relies on the ability to foster a strong culture of privacy among developers, as they are also users of applications.

In practice, the end-to-end encryption implementation is a gradual process. It requires long-term planning even for mature companies like Facebook: Don't expect full end-to-end encryption on Messenger until 2022 'at the earliest'. Zoom is rolling out phase 2 in 2021 out of a 4-phase plan that will take years to achieve. WhatsApp has just started to propose end-to-end encryption for backup this year, 5 years after messages! Doctolib has been working on the implementation since 2019 with a roadmap for at least 2022.

Hoping that this post brought you some light in the dark corridors of encryption! End-to-end encryption is one example revealing that practicing security within a fast-growing company is about finding the most relevant innovative techniques according to many criteria: security and privacy potential, the risks to mitigate, the cost of implementation and sometimes the functionality trade-off.