Milk Video (YC W21) – Edit online event recordings quickly

Hello HN gang! Lenny and Ross here, working on Milk Video (https://milkvideo.com/), a browser-based tool to turn long videos into watchable clips. We speed up the workflow for marketers editing long, boring Zoom recordings and webinars into visually engaging clips with quality templates and styled captions.

Ross and I met 8 years ago in Shanghai, where we worked at an education startup and organized tech and design events. When we realized Covid was creating a tsunami of webinars, Ross noticed the growing cost of editing all the new content as B2B companies replaced their in-person marketing channels with online events.

Most registrants to online events don't end up attending. They may be interested in the content, but they won’t take time to watch an entire webinar recording. Webinar content has a short shelf life unless it is reworked into a friendlier format. Doing that with traditional video editing software is cumbersome, so it often doesn’t happen. It’s time-intensive to review videos for key moments, ask designers to create appropriate graphics and captions, and receive final approval from managers.

We started out contacting companies organizing webinars, and learned they were stuck in a vicious cycle of constantly having to focus on the next upcoming event. We started manually editing videos for them to better understand how the most engaging bits could be reworked. Doing this manually revealed a glaring problem: the technology interfacing with video has changed dramatically, but the editing software hasn’t. Video editing software is designed for film makers or social media, and businesses creating video content have very different needs.

Milk Video uses a transcript-to-video based interface to review long recordings and minimize the mental effort around editing. We transcribe uploaded videos, present you with the content so you can quickly clip the best parts, and allow you to use templates to compose visually interesting layouts with additional assets, like logos or static text.

We made a drag-and-drop interface for creating short video clips with styled word-by-word captions. In a world where people often don't have their audio on, the timestamp information on a machine-generated transcript is perfect for creating interesting visual elements, such as captions styled one word at a time. This also makes content accessible by default. And because most webinars or Zoom recordings are visually similar, we have the ability to recommend which video templates might be best suited for their uploaded content in the future.

The frontend is a React app based on Redux Toolkit and Recoil.js. Our performant transcript interface is made possible due to Slate.js. Our backend is a Ruby on Rails app and depends on a non-trivial number of serverless functions hosted on Google Cloud and AWS. Our speech-to-text provider is AssemblyAI, who we found were both cheaper, faster and better than Google and Amazon.

We would love your feedback on the tool. We are spending a lot of time working directly with our first users, and would appreciate all of the input we can get. I’m also happy to go into detail around how any specific parts work! We’ll be in the comments and are eager to hear all your thoughts!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov