Roe AI (YC W24) – AI-powered data warehouse to query multimodal data

Hey HN, we’re Richard and Jason from Roe AI (https://getroe.ai). We’re building a query engine that lets data people do SQL queries on various kinds of unstructured data (videos, images, webpages, documents) using LLM-powered data processors.

Here is a 3-minute video: https://www.youtube.com/watch?v=9-WwJk1v5mI, showing how to create an LLM data processor to process videos, build a semantic search for image data, and use it with SQL. The problem we tackle is that data analysts cannot quickly answer their business questions around unstructured, multimodal data. For example, product teams want to understand user session replay videos to understand the painpoints of using their product. Ads teams need to know everything about an advertiser based on their web pages, such as the products they offer, payment methods, etc. Marketing teams need to know how product placement or music in a marketing campaign could get more views. And so on.

For data that is structured, questions like these can be answered quickly with SQL queries in Snowflake / BigQuery. But when you have unstructured multimodal data, it becomes a complex analysis process: open a Python notebook, write custom logic to get these multimodal data from blob storage (or write a crawler first if you need webpage data), find an AI model, do prompt engineering, do data ops to productionize the workload in a data workflow, etc. We simplify this process to a few lines of SQL.

How it works: first, we leverage multimodal LLMs as data processors because they’re good at unstructured data information extraction, classification or any arbitrary tasks. Next, we’ve built a user interface for data people to explore multimodal data and manage AI components. Then we have a quick semantic index builder for multimodal data. (We often see databases provide vector search functionality but not indexing building, so we built that.) Utility functions deal with multimodal data, like video cutter, PDF page selector, etc. Finally, SQL is the command line for slicing and dicing multimodal data.

How we got here: I’ve experienced 3 data evolutions in the last 10 years. At UC Berkeley, I was a data researcher using a supercomputer cluster called Savio. It was a bare-metal way to analyze the data—I had to move CSV between machines. Then at LinkedIn, I had Hadoop + Pig / Scala Spark. That abstracted most of the work, but I spent hours tuning jobs and had a headache manipulating HDFS directories. Later I joined Snowflake, and was like, holy – data analysis can be this simple – I can just use SQL to do everything within this data warehouse! I asked myself: why can’t we make something like Snowflake for unstructured data? That was the impulse behind Roe.ai and it’s been driving me ever since.

To get started, you can sign in at https://app.roe-ai.com/ and there are docs at https://docs.roe-ai.com/. You can load unstructured data via our SQL and File API, Snowflake Staging Data Connector, S3 Blob Storage Data connector, Zapier Roe AI Zap, or the SQL function load_url_file() to get a file from a URL.

Some logistics: the product is free to start, and we’ve preloaded $50 AI credits—enough to process 3000 one-pager PDFs. If you use all $50, just email us, and we’ll give you more. The solution is not open-sourced because it is too complex to be self-hosted, but let us know if you see the potential for open-source.

The product is early and could have bugs and UX problems. It’d be incredible if you could give it a spin anyway and we hope it will be interesting and that you’ll let us know what you think! Jason and I will be around in the thread and are really interested in hearing from you!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2024 best of 2023 yc w25 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov