Reducto Studio (YC W24) – Build accurate document pipelines, fast

Hi HN! We’re Adit and Raunak, co-founders of Reducto (YC W24, https://reducto.ai). Reducto turns unstructured documents (e.g., PDFs, scans, spreadsheets) into structured data. This data can then be used for retrieval, passed into LLMs, or used elsewhere downstream.

We started Reducto when we realized that so many of today’s AI applications require good quality data. Everyone knows that good inputs lead to better outputs, but 80% of the world’s data is still trapped inside of things like messy PDFs and spreadsheets. Raunak and I launched a really early MVP of parsing and extracting from unstructured documents, and were lucky to have a lot of interest from technical teams when they realized that the accuracy was something they hadn’t seen before.

We started by just releasing an API for engineers to build with, but over time we realized that an accurate API was only part of the puzzle. Our customers wanted to be able to easily set up multi step pipelines, evaluate and iterate on performance within their use case, and work with non-engineering teammates that were also involved in the real world document processing flow.

That’s why we’re launching Reducto Studio, a web platform that sits on top of our APIs for users to build and iterate on end-to-end document pipelines.

With Studio, you can:

- Drop an entire file set and get per-field and per-document accuracy scores against your eval data.

- Auto-generate and continuously optimize extraction schemas to hit production-grade quality fast.

- Save every run, iterate on parse/extract configs, and compare results side-by-side.

You can see some examples here (https://studio.reducto.ai) or you can watch this walkthrough: https://www.loom.com/share/b243551741c642c6a594c00353fcecb3.

If you’d like to upload your own document you can log in and do so as well - we don’t make you book a demo or put a payment down to try it.

Thanks for reading and checking it out! This is only the first step for Studio, so we’d love feedback on anything: UX rough edges (we know they’re there!), features that would make evaluations better for you, hard documents you’ve had trouble with, or anything else about wrangling with unstructured data.



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2024 best of 2023 yc s25 yc w25 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov