Flow data abounds in financial markets, with the constant churn of investment capital creating a near-limitless volume of data on a daily basis. Flow data, in the most general sense, is an indication of where money is moving from/to in the market. Buried within this data are signals that can be mined to both retroactively explain and proactively predict market outcomes. For example, flow data can be used to analyze if money is moving into or out of a sector (like oil & gas), a geography, or even a single stock. Unsurprisingly, these datasets have considerable value to financial institutions.
From a theoretical perspective, all market participants generate and collect flow data, with a single entities’ transactions describing a portion of the market. However, the diversity of market participants means that just as the flow of capital is not evenly distributed, neither is the information associated with it. Instead, this data is concentrated around a smaller subset of highly connected nodes. These nodes are typically large sell-side institutions. Similarly, market infrastructure firms, be it exchanges, settlement venues, central counterparty clearing houses etc., all capture information about this capital and asset flow and thus have valuable data sets.
The owners of flow data, in all its forms, have a material theoretical advantage to understand and predict market outcomes. Think of a bulge bracket bank: only this bank has its unique combination of consumer banking, consumer credit card, and institutional flow data. Most importantly, the relative inaccessibility of this data makes it likely that signals buried within the flow represent an edge over the near-efficient market (e.g. it can be used to create alpha).
Why Is Flow Data Sensitive
Consider a typical transaction, perhaps the sale of a stock. Data associated with this transaction would include:
At first glance, none of this may seem particularly revelatory. But consider large volumes of this data, collected across all market participants. With a little work, you may be able to glean insights on trading strategies of a specific asset manager, or that a company is starting to struggle (or be ready to pop!) Would you want your broker to study your intended trades, uncover your strategy and then execute ahead of you driving up your price? This is more than just protecting a name of a client, it’s about protecting the entire contents of a data set such that it can’t be leveraged against any position in the market. Each market participant generally considers their trades, and associated attributes including timestamps, spreads and counterparties as extremely sensitive—in the case of a hedge fund, the trading strategies are in fact their core intellectual property!
Because of this sensitivity, the holders of flow data must respect and ensure the confidentiality of the data and the clients involved. But this creates a fundamental conflict: how can one use this data to its fullest extent, while simultaneously respecting this need for privacy?
How to safely work with Flow data
To derive maximum value from this data, all parties that hold it must satisfy two conflicting, but not mutually exclusive requirements:
These conflicting requirements, and methodologies to satisfy both of them, have historically created inefficiencies in the use of flows. For example, this table summarizes the relative capabilities of methodologies to ensure privacy across three key capabilities: Maintaining privacy, providing accurate results, and a qualitative measure of value (think of this as how much data is “lost” in order to maintain confidentiality):
Legacy privacy techniques often sought to achieve privacy/accuracy balance by either aggregating information or redacting data based on some manually defined ruleset. Unfortunately, this trade-off is inefficient, with either accuracy or privacy suffering from suboptimal outcomes. And worse, even when these techniques are applied, client confidentiality is not truly satisfied, putting themselves at risk. Another technique used to protect participants is to introduce latency into the data, providing flow information as late as t+3 from the event, that is 3 days after the event happened. While an effective means to reduce the impact of any potential privacy breach, the likelihood of any signal being ‘stale’ grows as the latency increases (e.g. the data loses significant value.)
LeapYear has been pioneering a different approach to this problem. Rather than rely on suppression of information, or delay/deny access to ensure privacy, this patented approach instead ensures that any analysis of the data can be rigorously and automatically kept private. If we can prove to both the owners of this data, and the participants in the data sets, that each and every analysis ensures privacy while releasing maximum information signal, we satisfy the conflicting requirements laid out above.
Flow data is a unique and valuable data source to many participants in the financial services industry. The data holds massive potential to generate insights and subsequent positive commercial outcomes. But the need to ensure the privacy and confidentiality of the participants in this data is paramount. Companies that achieve the dual goal of appropriate use, and privacy, will find a competitive advantage in the market. LeapYear’s flow data solutions are a quantum leap in institutions’ ability to achieve this goal. If you’d like to learn more about our solutions, please contact us.