Most public company data is unstructured and textual. Because relevant information is hard to find, a lot of corporate data is radically underused, to the detriment of investors. For example, our research shows it can take 12-18 months for corporate malfeasance to be incorporated into stock price after clear warning signs appear in financial text. Hard-to-find information that we extract includes accounting and governance choices, product defects, regulatory issues, customer/market reliance and much more.
One example is Sino-Forest, Canada's Enron. Sino-Forest was a darling of Canadian investors until an infamous exposé, by short-seller Muddy Waters, in 2011. It turned out it was a forestry company that didn’t actually own any forests. Months before the exposé and crash, there were obvious red flags in the company’s disclosures including buying and selling from companies controlled by their directors and problems with the review of their bookkeeping! Our algorithms picked up these red flags and more, and assessed Sino-Forest as high risk when we ran our models on the company’s historical filings.
I’m a CPA and a developer (odd combo). The tech community has largely ignored public company financial disclosure. A few years ago, I published a basic piece using computational methods to analyze cannabis disclosure. The local regulatory agency contacted me to give them a workshop on text analytics. It was then that it hit home how little was being done in the field.
Information drives financial markets. The difficulty of assessing risks hidden in long public filings makes earning manipulation, and even fraud, both possible and profitable. Earnings manipulation involves using the flexibility in accounting standards to make financial statements look better than reality. This is easier than most people realize because accounting involves MANY choices and estimates.
There is money to be made by accessing and trading on underused predictive signals. Making money by stopping fraud is a win-win situation.
There are two main technical challenges thwarting progress in the field: (1) NLP models work best on short (500 character) text, but financial filings are hundreds of pages long, and (2) important and unimportant language sounds very similar in financial text. For instance, this sentence sounds like it could be indicative of terrible things going on behind the scenes but is in fact, just boilerplate disclosure: “We face risks and uncertainties related to litigation, regulatory actions and government investigations and inquiries.” You can see how ML models easily get confused.
There’s a big gap in both academia and industry. A lot of effort is being put into forcing results from non-existent linguistic signals. Models that claim to predict specific outcomes often don’t hold up to scrutiny in practice.
In order to overcome the technical challenges we used supervised and semi-supervised learning with high quality labels, we focused on tangible facts represented in textual context, and we adapted language models using domain expertise.
As far as we know, no other solution is able to identify problematic/risky disclosure algorithmically. Using search terms to do something similar results in overwhelming noise. The disclosure selected by our algorithms is highly predictive of downside risk - validated in deployment and also in backtesting.
We launched our core product in April 2021 (see https://bedrock-ai.com) and it’s used by hedge funds and institutional investors. We’re also doing a pilot to support Canadian securities regulators (https://bit.ly/3wOwOj6). We’ve also just launched a minimalist free site, Ledge (https://ledge.bedrock-ai.com), to help retail investors stay up to date on material events at companies they follow. Companies are required to disclose material events between their quarterly reports, but these disclosures rarely make the news.
Our core/premium product is currently only available to institutions, in part because retail investors generally don’t prioritize risk management and therefore aren’t committed customers. We plan to expand the free site and better support individual investors as we grow.
We would love to hear from you. Have you tried to read annual reports and gotten lost in the weeds? What has your experience been in making NLP models work on financial text?