
Blog Post |
Why I Joined LeapYear – Scaling the Adoption of Differential Privacy
I am delighted to be joining LeapYear as a scientific advisor this month. At Boston University, I am a professor of computer science and a founding member of the new Faculty of Computing and Data Sciences. I got my PhD at MIT in 2004, and was a faculty member at Penn State for ten years before joining BU in 2017.
Differential privacy (DP)—an idea that is central to the products that LeapYear is creating—was first laid out in a 2006 paper I wrote with Cynthia Dwork, Frank McSherry, and Kobbi Nissim. DP came out of a several-year push by myself and others to find good ways to quantify what is leaked about an individual when information about a sensitive data set is published or shared. We settled on a basic question: how easy is it for an outside observer to use whatever was shared to tell if a particular person, say Alice, was there in the data? Differential privacy puts a cap on just how well the outside observer can do.
Differentially private algorithms come with a powerful guarantee: No matter what an outside observer knows ahead of time, they will learn the same thing about Alice regardless of whether her data were actually used in the computation. This LeapYear blog post gives more background, and some examples to help understand where DP comes from.
I am advising LeapYear because I’m excited to see differential privacy brought to widespread use. When we wrote our paper in 2005, DP was “just” a theoretical idea with a few simple algorithms. The last fifteen years have seen an explosion of research on the idea—go to any big machine learning conference these days and you’ll find a dozen or more papers on differential privacy, doing everything from developing the fundamental statistical theory to fitting state of the art deep neural networks. In 2016, our original paper won the prestigious Gödel Prize for theoretical papers in computer science that have had a lasting impact.
But commercial deployments have historically been lacking. The lack of availability stems directly from the variety of challenges that must be faced when building a useful DP system. First, it’s generally quite hard to implement DP algorithms correctly. The engineering team needs to understand both the mathematics and system-level details to ensure the promises of the approach. They also need to understand the algorithmic aspects of a few different areas: statistics, data manipulation, machine learning, and visualization, among others. In many cases, new algorithms must be developed to meet the needs of an application. Finally, the system must also make the techniques easy to use, ensuring that non-DP experts have the tools, training and interfaces required for broad adoption.
When LeapYear reached out, I saw an opportunity to help get these ideas from the research community into a broad range of industries and applications through their established and growing network of clients.
The availability of such a platform couldn’t be more timely. It’s no secret that big shifts in tech have led to the collection of vast data sets about every aspect of our lives, and the current business environment makes it easy to sell, trade and consolidate that information. The pandemic has only accelerated the trend, pushing us online in ways we would have found hard to believe seven or eight months ago.
What’s lacking are tools for businesses (and government agencies, and hospitals, and universities, and many other organizations) to manage all that data responsibly. They need ways to get the benefits of the data—accurate prediction models, informative statistics—without the huge cost to individuals’ privacy that comes with the current environment. Well-engineered differentially private systems are an important part of the toolkit, and one that LeapYear is well-positioned to provide. I’m happy to be a part of their effort.