Golpo (YC S25) – AI-generated explainer videos

Hey HN! We’re Shraman and Shreyas Kar, building Golpo (https://video.golpoai.com), an AI generator for whiteboard-style explainer videos, capable of creating videos from any document or prompt.

We’ve always made videos to communicate any concept and felt like it was the clearest way to communicate. But making good videos was time-consuming and tedious. It required planning, scripting, recording, editing, syncing voice with visuals. Even a 2-minute video could take hours.

AI video tools are impressive at generating cinematic scenes and flashy content, but struggle to explain a product demo, walk through a complex workflow, or teach a technical topic. People still spend hours making explainer videos manually because existing AI tools aren’t built for learning or clarity.

Our solution is Golpo. Our video generation engine generates time-aligned graphics with spoken narration that are good for onboarding, training, product walkthroughs, and education. It’s fast, scalable, and built from the ground up to help people understand complex ideas through simple storytelling.

Here’s a demo: https://www.youtube.com/watch?v=C_LGM0dEyDA#t=7.

Golpo is built specifically for use cases involving explaining, learning, and onboarding. In our (obviously biased!) opinion, it feels authentic and engaging in a way no other AI video generator does.

Golpo can generate videos in over 190 languages. After it generates a video, you can fully customize its animations by just describing the changes you want to see in each motion graphic it generates in natural language.

It was challenging to get this to work! Initially, we used a code-generation approach with Manim, where we fine-tuned a language model to emit Python animation scripts directly from the input text. While promising for small examples, this quickly became brittle, and the generated code usually contained broken imports, unsupported transforms, and poor timing alignment between narration and visuals. Debugging and regenerating these scripts was often slower than creating them manually.

We also explored training a custom diffusion-based video model, but found it impractical for our needs. Diffusion could produce high-fidelity cinematic scenes, but generating coherent sequences beyond about 30 seconds was unreliable without complex stitching, making edits required regenerating large portions of the video, and visuals frequently drifted from the instructional intent, especially for abstract or technical topics. Also, we did not have the compute to scale this.

Existing state-of-the-art systems like Sora and Veo 3 face similar limitations: they are optimized for cinematic storytelling, not step-by-step educational content, and they lack both the deterministic control needed for time-aligned narration and the scalability for 5–10 minute explainers.

In the end, we took a different path of training a reinforcement learning agent to “draw” whiteboard strokes, step-by-step, optimized for clear, human-like explanations. This worked well because the action space was simple and the environment was not overly complex, allowing the agent to learn efficient, precise, and consistent drawing behaviors.

Here are some sample videos that Golpo generated:

https://www.youtube.com/watch?v=33xNoWHYZGA (Whiteboard Gym - the tech behind Golpo itself)

https://www.youtube.com/watch?v=w_ZwKhptUqI (How do RNNs work?)

https://www.youtube.com/watch?v=RxFKo-2sWCM (function pointers in C)

https://golpo-podcast-inputs.s3.us-east-2.amazonaws.com/file... (basic intro to Gödel's theorem)

You can try Golpo here: https://video.golpoai.com, and we will set you up with 2 credits. We’d love your feedback, especially on what feels off, what you’d want to control, and how you might use it. Comments welcome!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2024 best of 2023 yc s25 yc w25 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov