Greptile (YC W24) - RAG on codebases that actually works

Hi HN, we're the co-founders of Greptile, a tool that can accurately answer questions about complex codebases. Developers use us to spend less time wrestling with codebases and more time actually writing code. Here's a demo: https://youtu.be/qI24eKO1YX0. You can try it on 100 popular repos here: https://app.greptile.com/repo, and on your own repo (if you give permission - more on that below) here: https://app.greptile.com.

We are far from the first people to try "RAG on your codebase". We focus on full codebase comprehension: using LLMs to accurately answer difficult questions with full context of large, complex, and even multi-repo codebases.

Simple RAG alone is not sufficient for this task. Codebases aren’t like most PDFs, docs, or other similar data types. They are graphs—complex puzzles where each piece is interlinked. So Greptile does a few things past simple RAG:

(1) Instead of directly embedding code, we parse the AST of the codebase, recursively generate docstrings for each node in the tree, and then embed the docstrings.

(2) Alongside vector similarity search and keyword search, we do “agentic search” where an agent reviews the relevance of the search results, and scans the source code to follow references that might lead to something important. Then it returns the relevant sources.

For example, here are a couple questions that this system is able to answer in our test repo that simple RAG couldn’t (in our experience):

“Where are the auth providers configured?” (They are in an array inside of an options.ts file, where looking at the file it’s not obvious it’s an auth related file. However, because that array is imported into the auth/route.ts file, Greptile’s agent traces and find it)

“How would I add a postgres connector?” (The best way to answer this is to see how the Redis connector is set up and mirror it. Simple RAG sometimes retrieves some of the code for the Redis connector, but Greptile’s agent follows the connections to retrieve all the code that the redis connector touches, and uses that to write instructions.)

Developers (including at Stripe and Microsoft) are using Greptile for things like:

Debugging—you can paste in an error message and it does a pretty good job of diagnosing the root cause and suggesting fixes.

Grokking OSS repos—for example, if you're forking a repo, modifying it for your usecase, or just integrating it, Greptile lets you add multiple repos and dependencies in the same chat session so it has full context.

Parsing legacy code at work—especially if original engineers have left the company.

Since we're accessing your private code, we're very careful with security. We don't store any code on our servers after initial processing, and just pull snippets as needed from the GitHub API.

Quick note: when you sign in with GH, it might ask for permission to "act on your behalf". This is a quirk of GitHub's wording—our permissions are read-only and the only thing we do "on your behalf" is read code, so we can index the repo.

We came up with this idea while working at AWS—the codebase was super complicated, the docs were sparse and out of date, and our team was remote so it was slow to get answers to questions. We picked "greptile" because of "grep" and also we just wanted a somewhat silly name.

Try it out! It's a work in progress, so any feedback is appreciated. Here are the links again: for popular open source repos see https://app.greptile.com/repo, and to get it working on your own repo, start at https://app.greptile.com.

If you have experience working with a complex codebase at work or for a project, I’d love to hear about it. It really helps us educate our product direction. Looking forward to comments!

edit. For those who want to try this on large or private repos, here is a promo code for a free month: HACKERNEWS100



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov