Bedrock AI (YC S21) – Using ML to identify red flags in SEC filings

We’re Kris, Suhas, and Heather (YCS21) and we’re building Bedrock AI ( We use machine learning to extract hard-to-find information and assess risk in public company reports (SEC filings). Our platform is used by investors to improve portfolio returns and mitigate downside risks.

Most public company data is unstructured and textual. Because relevant information is hard to find, a lot of corporate data is radically underused, to the detriment of investors. For example, our research shows it can take 12-18 months for corporate malfeasance to be incorporated into stock price after clear warning signs appear in financial text. Hard-to-find information that we extract includes accounting and governance choices, product defects, regulatory issues, customer/market reliance and much more.

One example is Sino-Forest, Canada's Enron. Sino-Forest was a darling of Canadian investors until an infamous exposé, by short-seller Muddy Waters, in 2011. It turned out it was a forestry company that didn’t actually own any forests. Months before the exposé and crash, there were obvious red flags in the company’s disclosures including buying and selling from companies controlled by their directors and problems with the review of their bookkeeping! Our algorithms picked up these red flags and more, and assessed Sino-Forest as high risk when we ran our models on the company’s historical filings.

I’m a CPA and a developer (odd combo). The tech community has largely ignored public company financial disclosure. A few years ago, I published a basic piece using computational methods to analyze cannabis disclosure. The local regulatory agency contacted me to give them a workshop on text analytics. It was then that it hit home how little was being done in the field.

Information drives financial markets. The difficulty of assessing risks hidden in long public filings makes earning manipulation, and even fraud, both possible and profitable. Earnings manipulation involves using the flexibility in accounting standards to make financial statements look better than reality. This is easier than most people realize because accounting involves MANY choices and estimates.

There is money to be made by accessing and trading on underused predictive signals. Making money by stopping fraud is a win-win situation.

There are two main technical challenges thwarting progress in the field: (1) NLP models work best on short (500 character) text, but financial filings are hundreds of pages long, and (2) important and unimportant language sounds very similar in financial text. For instance, this sentence sounds like it could be indicative of terrible things going on behind the scenes but is in fact, just boilerplate disclosure: “We face risks and uncertainties related to litigation, regulatory actions and government investigations and inquiries.” You can see how ML models easily get confused.

There’s a big gap in both academia and industry. A lot of effort is being put into forcing results from non-existent linguistic signals. Models that claim to predict specific outcomes often don’t hold up to scrutiny in practice.

In order to overcome the technical challenges we used supervised and semi-supervised learning with high quality labels, we focused on tangible facts represented in textual context, and we adapted language models using domain expertise.

As far as we know, no other solution is able to identify problematic/risky disclosure algorithmically. Using search terms to do something similar results in overwhelming noise. The disclosure selected by our algorithms is highly predictive of downside risk - validated in deployment and also in backtesting.

We launched our core product in April 2021 (see and it’s used by hedge funds and institutional investors. We’re also doing a pilot to support Canadian securities regulators ( We’ve also just launched a minimalist free site, Ledge (, to help retail investors stay up to date on material events at companies they follow. Companies are required to disclose material events between their quarterly reports, but these disclosures rarely make the news.

Our core/premium product is currently only available to institutions, in part because retail investors generally don’t prioritize risk management and therefore aren’t committed customers. We plan to expand the free site and better support individual investors as we grow.

We would love to hear from you. Have you tried to read annual reports and gotten lost in the weeds? What has your experience been in making NLP models work on financial text?

Get Top 5 Posts of the Week

best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2020 best of 2019 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows young talents

andrey azimov by Andrey Azimov