“Hacker News Recap” (https://www.wondercraft.ai/podcasts/hacker-news-recap) a podcast produced using our platform, has been running for 4 months and currently gets close to 23k listens per month. We’ve made its analytics publicly available: https://op3.dev/show/f77aea62-97e5-5cce-92c6-9464e51c30c6.
Having previously attempted to start a podcast, we were well aware of the difficulties. Figuring out what equipment and software you need to buy is a daunting start. Editing is a lengthy and tedious process, technical difficulties often occur during recording, and planning logistics around recording is a hassle. As a result, content release is infrequent, which leads to lackluster growth.
At the same time, podcast consumption is experiencing exponential growth. There are 500M podcast listeners around the world, double in size compared to 5 years ago. Apart from the growth in listeners, podcasts are the medium that is most likely to influence behavior, which is the reason why the number of businesses having podcasts has grown 5x over the past 5 years. Finally, the last piece that led to the creation of Wondercraft is that text-to-speech models saw a big improvement about 6 months ago, with ElevenLabs releasing models with an output that is almost indistinguishable to humans (see HN thread here: https://news.ycombinator.com/item?id=34361651).
Wondercraft integrates realistic text-to-speech with an infrastructure that simplifies podcast creation. For example, you can integrate music, publish your podcast / create an RSS feed, generate a video for your episode, get assistance in the script generation, auto generate show notes and transcript and translate your podcast all together. All text based tasks (e.g. script assistance, show note generation, etc) are completed using a chain of custom prompts to LLM models. All text-to-speech is done through custom voices that are either synthetically generated or professionally cloned from Voice Actors, using the ElevenLabs platform. Tasks such as episode translation involve the use of both LLMs and ElevenLabs. Video generation runs using Remotion and the RSS feed is an XML creation and updating routine.
Since launching, we’ve had more than 13k users sign up to create their podcast. Use cases that we’re seeing include: businesses repurposing their blogs and generating video content for their socials; writers/bloggers/newsletters reaching audience through another medium; news outlets and publications adding a news rundown podcast in their lineup; businesses creating internal educational/cultural material; and podcast studios using Wondercraft to serve client needs faster.
Wondercraft is not a tool for fully AI generated content. Rather, we save people time by transferring content they’ve created (e.g. an article they’ve written) to another medium. This technology is best suited for news rundowns and narrational format podcasts (often used by businesses talking about a niche topic). And while interview and conversational formats will sound better person-to-person, the logistical and (often) sound quality issues remain, so we’re testing out an “Async Podcasts” feature, where an interviewee can respond to questions async in writing, share a photo and (optionally) a clip of their voice, and a podcast will be created out of it.
We’d love to hear any thoughts, comments or experiences you may have had in relation to leveraging text to speech for podcast creation. Thank you for taking the time to read!