At my last job, it dawned upon me how many production incidents and unhappy customers could have been avoided with more test automation - “an ounce of prevention is worth a pound of cure”. However, it wasn’t clear that you can get prevention for just an ounce. Our teams ramped up on investment into testing, especially end-to-end tests (via spinning up a headless browser in CI and testing as an end user), but it quickly became clear that these were incredibly expensive to build and maintain, especially as test suite and application complexity grow. When we asked around other engineering teams from different companies, we consistently heard how time-intensive test maintenance was.
The worst part of end to end tests is when they fail occasionally in CI but never locally—a heisenbug in your test code, or what’s usually referred to as a flaky test. The conventional way to debug such an issue is to replay a video of your CI’s test browser, stepping between video frames to try to parse what could be happening under the hood. Otherwise, your CI is just a complete black box.
Anyone that finds this story familiar can probably attest to days spent trying to debug an issue like this, possibly losing some hair in the process, and “resolving” it in the end by just deleting the test and regaining their sanity. Some teams even try to put front-end monitoring tools for production into their CI process, only to realize they aren’t able to handle recording hundreds of test actions executed by a machine over just a few seconds.
After realizing how painful debugging these tests could be, we started putting together a debugger that can help developers pinpoint issues, more like how you debug issues locally. Teams have told us there’s a night and day difference between trying to debug test failures with just video, and having a tool that can finally tell them what’s happening in their CI browser, with the same information they’re used to having in their browser’s devtools.
We give you the ability to inspect DOM snapshots, network events, and console logs for any step taken in a Cypress test running in CI, to give more insight into why a particular test might be failing. It’s like Fullstory/LogRocket, but for CI failures instead of production bugs. (We’re starting with Cypress tests, with plans to extend further.)
Our tool integrates with Cypress via their plugin API, so we’re able to plug in and record tests in CI with just an NPM install and 2 lines of code. From there we’re able to hook into Cypress/Mocha events to capture everything happening within the test runner (ex. when a test is starting, when a command is fired, when an element is found, etc.) as well as open a debugger protocol port with the browser to listen for network and console events. While a test suite is running, the debugger is consistently collecting what’s happening during a test run, and uploads the information (minus user-configured censored events) after every test completes.
While this may sound similar to shoving a LogRocket/FullStory into your test suite, there’s actually quite a few differences. The most practical one is that those tools typically have a low rate limit that work well for human traffic interacting with web apps at human speeds, but break when dealing with parallelized test runner traffic interacting with web app at machine speeds. Other minor details revolve around us associating replays with test metadata as opposed to user metadata, having full access to all network requests/console messages emitted within a test at the browser level, and us indexing playback information based on test commands rather than timestamp (time is an unreliable concept in tests!).
Once a test fails, a Github PR comment is created and an engineer can immediately access our web app to start debugging their test failure. Alternatively, they can check our web dashboard as well. Instead of playing a video of the failure in slow motion to understand the issue, an engineer can step through the test command-by-command, inspect the DOM with their browser inspect element tool at any point, view what elements the test interacted with, if any console messages were emitted during the action, or take a look at every network request made along with HTTP error codes or browser network error messages.
Typically with this kind of information, engineers can quickly find out if they have a network-based race condition, a console warning emitted in their frontend, a server-side bug, or a test failure from an edge case triggered by randomly generated test data.
We dream of a world where applications have minimal bugs, happy customers, built with engineering teams that don’t see testing as an expensive chore! Although the first pain we’re addressing is tests that fail in CI, we’re working on a bunch of things beyond that, including the second biggest issue in testing: test runtime length.
We have a free trial available for you to try out with your own tests, along with a few live demos of what our debugger looks like on an example test. You can get started here: https://deploysentinel.com/
We’re looking forward to hearing everyone else’s experiences with end to end tests, and what you think of what we’re doing!