Pointing things out over Zoom screenshare is highly time consuming ("click the 4th checkbox on the right", "click the handle and drag"), and trying to figure out what users are doing over a live chat or phone call typically leads to endlessly frustrating back-and-forths.
When COVID forced all of us into remote work, we found ourselves spending a lot of time in screensharing sessions. We were tired of choppy frame rates and blurry text, and realized that we could get around this by sharing the user’s screen in a different way. Rather than video streaming, which is how it’s usually done, we could send over diffs of their webpage’s DOM representation and reapply those in the viewer’s browser – this is similar to how virtual DOM frameworks like React work.
We first used this technique for an earlier project (https://news.ycombinator.com/item?id=23363250) that rendered React apps on the server (a Node equivalent of Phoenix LiveView). This reduces development complexity for web apps by completely eliminating the need for RPC layers (REST, GraphQL) – for instance, you'd be able to write to the database directly from your React component and share state across sessions with a single hook. It works by sending DOM updates from the server (e.g. insert a node, change an attribute) in response to input actions sent from the client (e.g. click a button, type a character).
This approach uses significantly less bandwidth compared to traditional screen sharing solutions, and gives us a semantic understanding of the webpage (e.g. a button is sent over as a <button />, instead of a blob of bytes). As a result, we can selectively filter out sensitive content and allow viewers to scroll and type on the webpage without any perceived latency.
To solve our screen sharing problem, we initially built a Chrome extension that let users browse web pages collaboratively. During YC, we saw that our early adopters were primarily using this tool to walk through their own web apps with their customers, so we decided to refocus the product towards helping companies onboard and support their users.
Because we're focused on the real time use case, we only record the DOM when a session is being viewed. This means that no data is sent to our servers unless Cohere is actively being used. Additionally, we don’t persist or retain any session data.
Thanks for reading our story – we'd love to get your thoughts, feedback, and ideas!