There’s a demo video at https://www.youtube.com/watch?v=ib3mRh2tnSo and a sandbox to try out (no sign-in required!) at https://demo.runtrellis.com/. An interesting historical archive of unstructured data we thought it would be interesting to run Trellis on top of are old Enron emails which famously took months to review. We’ve created a showcase demo here: https://demo.runtrellis.com/showcase/enron-email-analysis, with some documentation here: https://docs.runtrellis.com/docs/example-email-analytics.
Why we built this: At the Stanford AI lab where we met, we collaborated with many F500 data teams (including Amazon, Meta, and Standard Chartered), and repeatedly saw the same problem: 80% of enterprise data is unstructured, and traditional platforms can’t handle it. For example, a major commercial bank I work with couldn’t improve credit risk models because critical data was stuck in PDFs and emails.
We realized that our research from the AI lab could be turned into a solution with an abstraction layer that works as well for financial underwriting as it does for analysis of call center transcripts: an AI-powered ETL that takes in any unstructured data source and turns it into a schematically correct table.
Some interesting technical challenges we had to tackle along the way: (1) Supporting complex documents out of the box: We use LLM-based map-reduce to handle long documents and vision models for table and layout extraction. (2) Model Routing: We select the best model for each transformation to optimize cost and speed. For instance, in data extraction tasks, we could leverage simpler fine-tuned models that are specialized in returning structured JSONs of financial tables. (3) Data Validation and Schema Guarantees: We ensure accuracy with reference links and anomaly detection.
After launching Trellis, we’ve seen diverse use cases, especially in legacy industries where PDFs are treated as APIs. For example, financial services companies need to process complex documents like bonds and credit ratings into a structured format, and need to speed up underwriting and enable pass-through loan processing. Customer support and back-office operations need to accelerate onboarding by mapping documents across different schema and ERP systems, and ensure support agents follow SOPs (security questions, compliance disclosures, etc.). And many companies today want data preprocessing in ETL pipelines and data ingestion for RAG.
We’d love your feedback! Try it out at https://demo.runtrellis.com/. To save and track your large data transformations, you can visit our dashboard and create an account at https://dashboard.runtrellis.com/. If you’re interested in integrating with our APIs, our quick start docs are here: https://docs.runtrellis.com/docs/getting-started. If you have any specific use cases in mind, we’d be happy to do a custom integration and onboarding—anything for HN. :)
Excited to hear about your experience wrangling with unstructured data in the past, workflows you want to automate, and what data integration you would like to see.