With the advent of LLMs and recent breakthroughs in speech synthesis, conversational voice AI has just gotten good enough to create really exciting use cases. However, developers often underestimate what's required to build a good and natural-sounding conversational voice AI. Many simply stitch together ASR (speech-to-text), an LLM, and TTS (text-to-speech), and expect to get a great experience. It turns out it's not that simple.
There's more going on in conversation than we consciously realize: things like knowing when to speak and when to listen, handling interruptions, 0-200 ms latency and backchanneling phrases (e.g., "yeah", "uh huh") to signal that they are listening. These are natural for humans, but hard for AI to get right. Developers spend hundreds of hours on the AI conversation experience but end up with poor experiences like 4-5s long latencies, inappropriate cutoffs, speaking over each other, etc.
So, we built Retell AI. We have followed the overall paradigm of having speech-to-text, LLM, and text-to-speech components, but have added additional conversation models in between to orchestrate the conversation while allowing maximum configurability for the developers in each step. You can think of our models as adding a “domain expert” layer for the dynamics of conversation itself.
Retell is designed for you to bring your own LLM into our pipeline. Currently, we can achieve 800ms end-to-end latency, handle interruptions, speech isolation, with tons of customization options (e.g., speaking rate, voice temperature, add ambient sound). We created a guest account for HN, so you can try our playground with a 10-min free trial without login: https://beta.retellai.com/dashboard/hn (Playground tutorial: https://docs.retellai.com/guide/dashboard). Our product is usage-based and the price is $0.1-0.17/min.
Our main product is a developer-facing API, but you can try it without writing code (e.g. create agents, connect to a phone number) via our dashboard. If you want to test it in production, feel free to also self-serve with our API documentation. One of our customers just launched, and you can view their demo: https://www.loom.com/share/64f09a53bf6d4b3799e5ebd08b23fec4?...
We are thrilled to see what our users are building with our API, and we’re excited to show our product to the community and look forward to your feedback!