For half a century, the field of cryptography has facilitated the secure sharing of electronic information—credit card transactions, file transfers, and communications—protected by cryptographic protocols. This has been possible through rigorous, mathematically proven guarantees. The assurances provided by the protocols to both individuals and businesses are backed by unassailable mathematical proof and the protocols have therefore stood the test of time.

The field of data privacy, however, has historically been devoid of such foundations. The only known approaches for exposing data for analysis while still attempting to maintain confidentiality have been heuristic based: in other words, guesswork. These approaches involved redacting certain fields that were considered especially sensitive, such as PII, or rules, such as “reveal the age of a person but not their birthdate” or “only show a result if there are more than 20 people in the sample.”

It is a well-known fact that these techniques do not work; countless studies and real-world breaches have demonstrated that these approaches can be reverse engineered. Every time an approach is breached, privacy practitioners propose a new, slightly more sophisticated technique, such as adding noise to release “synthetic data,” performing aggregations such as “k-anonymity,” or instituting a broad class of statistical techniques called “data masking.” Ultimately, each technique has been broken, resulting in a compromise of highly sensitive information and harming both individuals and institutions. In the absence of an alternative, these approaches continue to be used in the enterprise today, perpetuating the related inefficiencies, loss of data value, and vulnerabilities.

Differential privacy emerged 15 years ago as a solution to these issues. Differential privacy is a rigorous, mathematical definition of data privacy. When a statistic, algorithm, or analytical procedure meets the standard of differential privacy, it means that no individual record significantly impacts the output of that analysis—the mathematics behind differential privacy ensures that the output does not contain information that can be used to draw conclusions about any record in the underlying data set. With differential privacy, there is a mathematical bound on the amount of information released by the system, also known as an information theoretic guarantee. The beauty of the approach is that the sophistication level of an adversary, the amount of computational resources they use, or how much outside information they have access to does not matter; when differential privacy is satisfied, the information required to draw conclusions about individuals’ private information is just not sufficient.