Unlocking Value From Sensitive Alternative Data Sets – Perspective from Tony Berkman

Tony Berkman


April 30, 2021

I’m excited to announce that I’ve recently joined LeapYear ( as a strategic advisor.

I have spent the entirety of my professional career (over 30 years!) trying to turn large amounts of data into intelligence with a focus on helping to inform superior investment decisions. Over the past 18 years I became particularly intrigued with what is often referred to as “alternative” or “non-traditional” datasets.  I’ve seen the term “large” evolve from millions of records to tens of billions, sometimes trillions of records. The availability of powerful cloud infrastructure from companies like Amazon and Google, along with the evolution and accessibility of dynamic data analysis tools (e.g. Tableau, Looker, TensorFlow, Pandas, NumPy, R, etc.) has gone a long way to democratize the ability to turn data into the information that can empower more well-informed and impactful decisions across any industry.  As a company, if you are not leveraging both internal and third-party data as an integral part of running your business, you will very likely get eclipsed by competitors who are.

As internal and external data analysis becomes increasingly ubiquitous, more and more companies have come to realize that their data is an asset that could provide a meaningful revenue stream. In fact, new businesses are being created regularly that offer free or discounted services with the aim of monetizing the data they end up collecting.  With an ever-increasing thirst for new and differentiated data on the data consumer side and more and more companies eager to monetize their own data, one would expect there to be an active, transparent and fluid market for the buying and selling of data. However, this has not been the case for a number of important reasons around privacy, provenance, compliance and perception.

Alternative data often contains information around the identity of a person or entity that is either explicit or may be inferred, including by bad actors. Important regulations such as HIPAA, GDPR and CCPA have been enacted to protect the privacy of individuals while company’s identities are often protected within their contracts with partners and clients. Even in cases where identities are not protected by a regulation or contract, there are often good reasons around ethics, fear of negative perceptions, and wanting to act in the best interests of customers (both individuals and entities) that inhibit the movement of data. Over the years, I have gotten excited about way too many datasets that could have answered all kinds of interesting questions but were ultimately out of reach because of reasons around compliance or perception.

For years, I and others like me have contemplated ways to allow for the analysis of novel datasets that would not compromise or exploit the sensitivities within the data including PII and entity resolution where prohibited. While any solution will necessarily involve some tradeoff between the granularity of potential analyses and the rigor of protection on top of the data, LeapYear has by far the most dynamic solution I’ve come across with respect to this conundrum. The more I learned about the technology, the more I became convinced it would be a game-changer with respect to allowing nearly all companies with compelling data to make insights from that data available and for investment managers like those I’ve worked for as well as companies across all industries to access the types of data and intelligence necessary to make optimal business decisions. I’m excited to work with LeapYear as an advisor and be a small part of the inspiring path forward.

This website stores cookies on your computer to improve your website experience and provide more customized services to you. Read the cookie policy here.

I accept