In today’s digital economy, healthcare providers have learned how to not only be doctors, nurses and practitioners, but also become leading experts in data analytics, privacy and usage. The growing value of machine learning and artificial intelligence to the enterprise has only hastened this trend toward codependence between healthcare and sensitive data.
However, when you overlay the current regulatory environment, and public stance on privacy, a conundrum arises. How do we ensure both data privacy and data utility, as both are required in today’s modern healthcare data ecosystem?
HIPAA and similar laws and regulations provide guidelines for protecting data and maintaining data privacy. Unfortunately, some of the techniques employed to accomplish those requirements do not always adequately protect the individuals in those data sets. Many examples from healthcare and other major sectors exist today where theoretically anonymized sets have been combined with external data to re-identify individuals in those data sets. Often the methodology by which data has been de-identified lacks mathematical rigor, resulting in significant data privacy concerns. As noted by Dr. Yves-Alexandrede Montjoye, a computer scientist at Imperial College London, and author of a recent paper on re-identification, “[we] need to move beyond de-identification… Anonymity is not a property of a data set, but is a property of how you use it.”
While current practices exist to satisfy regulatory requirements to de-identify data, they fall short of addressing the conundrum of how to protect individuals’ privacy while enable data usage and utility at scale. Worse, many of these techniques rely on heuristic approaches, such as simply removing or aggregating columns, that in the end de-values the data set and compromises the integrity of downstream applications.
When we step back and think about the conundrum, we can identify a few key elements that get us to a statistically meaningful data set for application now and use downstream, all while meeting the dual goal of ensuring data privacy and solving for data utility. Those elements and their tenets are:
There are always new technologies on the horizon, but in particular a mathematical standard called differential privacy is well suited to address the first two must haves of data value and data privacy, and when combined with modern software architectures, helps to ensure data control as well
Differential privacy is a mathematical standard of privacy that provides a quantifiable, tangible means to help safeguard data privacy from many forms of compromise. A leading voice in differential privacy, Aaron Roth, associate professor of computer science at The University of Pennsylvania, explained, “What differential privacy promises is that nobody, no matter what they might already know, should be able to distinguish between the real world, in which your data was used, and the ideal world in which it wasn’t, substantially better than random guessing.”
Differential privacy has roots in deep academic study spanning back 15 years, with a large body of experts researching, peer reviewing and publishing the details of statistical and modeling functions. Companies such as Apple and Google (in Chrome) have used the technique to protect their users’ data, and the technique is gaining more traction each day.
However, the use of this privacy technique alone is not enough. To be useful, it must be incorporated into a practical delivery mechanism—such as software—that can be used by a non-expert, supported by a traditional IT team, and broadly applied to both data set types and statistical or analytical workflows. It’s not a trivial matter, but one that can be—and is being— solved today.
Finally, when the goals of data privacy and utility are solved, valuable use cases become possible.
It’s time to think differently about what it means to maintain the dual goal of data privacy and data utility. Siloed data is not the answer, nor is failing to honor data privacy rights and obligations. Don’t accept status quo—look for new solutions that will not only simplify compliance but amplify the ability to learn and help build the important insights that drive the future of healthcare.