Wondercraft (YC S22) – Use text-to-speech to create podcasts easily

Hi HN! We’re Dimitris and Youssef, founders of Wondercraft (https://www.wondercraft.ai/), a platform that leverages AI voices to make podcast creation simple. This video shows how it works: https://www.loom.com/share/fa8ac8eba8b9440dbe0321ccb8ba9426?....

“Hacker News Recap” (https://www.wondercraft.ai/podcasts/hacker-news-recap) a podcast produced using our platform, has been running for 4 months and currently gets close to 23k listens per month. We’ve made its analytics publicly available: https://op3.dev/show/f77aea62-97e5-5cce-92c6-9464e51c30c6.

Having previously attempted to start a podcast, we were well aware of the difficulties. Figuring out what equipment and software you need to buy is a daunting start. Editing is a lengthy and tedious process, technical difficulties often occur during recording, and planning logistics around recording is a hassle. As a result, content release is infrequent, which leads to lackluster growth.

At the same time, podcast consumption is experiencing exponential growth. There are 500M podcast listeners around the world, double in size compared to 5 years ago. Apart from the growth in listeners, podcasts are the medium that is most likely to influence behavior, which is the reason why the number of businesses having podcasts has grown 5x over the past 5 years. Finally, the last piece that led to the creation of Wondercraft is that text-to-speech models saw a big improvement about 6 months ago, with ElevenLabs releasing models with an output that is almost indistinguishable to humans (see HN thread here: https://news.ycombinator.com/item?id=34361651).

Wondercraft integrates realistic text-to-speech with an infrastructure that simplifies podcast creation. For example, you can integrate music, publish your podcast / create an RSS feed, generate a video for your episode, get assistance in the script generation, auto generate show notes and transcript and translate your podcast all together. All text based tasks (e.g. script assistance, show note generation, etc) are completed using a chain of custom prompts to LLM models. All text-to-speech is done through custom voices that are either synthetically generated or professionally cloned from Voice Actors, using the ElevenLabs platform. Tasks such as episode translation involve the use of both LLMs and ElevenLabs. Video generation runs using Remotion and the RSS feed is an XML creation and updating routine.

Since launching, we’ve had more than 13k users sign up to create their podcast. Use cases that we’re seeing include: businesses repurposing their blogs and generating video content for their socials; writers/bloggers/newsletters reaching audience through another medium; news outlets and publications adding a news rundown podcast in their lineup; businesses creating internal educational/cultural material; and podcast studios using Wondercraft to serve client needs faster.

Wondercraft is not a tool for fully AI generated content. Rather, we save people time by transferring content they’ve created (e.g. an article they’ve written) to another medium. This technology is best suited for news rundowns and narrational format podcasts (often used by businesses talking about a niche topic). And while interview and conversational formats will sound better person-to-person, the logistical and (often) sound quality issues remain, so we’re testing out an “Async Podcasts” feature, where an interviewee can respond to questions async in writing, share a photo and (optionally) a clip of their voice, and a podcast will be created out of it.

We’d love to hear any thoughts, comments or experiences you may have had in relation to leveraging text to speech for podcast creation. Thank you for taking the time to read!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov