Odigos (YC W23) – Instant distributed tracing for Kubernetes clusters

Hi HN! We’re Eden and Ari, co-founders of Odigos (https://github.com/keyval-dev/odigos). Odigos is an open-source project that lets you instantly generate distributed traces for your applications. It works alongside existing monitoring tools and does not require any code changes.

Our earlier experiences with monitoring tools were frustrating. Monitoring a distributed system with multiple microservices, we found ourselves spending way too much time trying to locate the specific microservice that was at the root of a problem. For example, we once spent hours debugging an application which we suspected was causing high latency, only to find out that the actual problem was rooted in a completely different application

Then we learned about distributed tracing, which solves exactly this problem. Unlike metrics or logs that capture a data point in time in a single application, a distributed trace follows a request as it propagates through a distributed environment by tagging it with a unique ID. This allows developers to understand the context of each request and how their distributed applications work.

The downside is that it is difficult to implement. Unlike metrics or logs, the value of distributed tracing is gained only after implementing it across multiple applications. If even one of your applications does not produce distributed tracing, the context propagation is broken and the value of the traces drops significantly.

We manually implemented distributed tracing for multiple companies, but found it a challenge to coordinate all the development teams to instrument their applications in order to achieve a complete distributed trace. Once the implementation was finished, we saw great value and fixed production issues much faster. But partial implementation wasn’t worth much.

We set out to automate this process. We knew how to do most of it, but the trickiest part was how to automatically instrument programs written in compiled languages (like Go). If we could do that, we would be able to automate the entire process of generating distributed traces. While researching, we realized that eBPF—a technology that allows the Linux kernel to load external programs for execution within the kernel—could be used to develop automatic instrumentation for compiled languages. That was the final piece of the puzzle, and with it we were able to develop Odigos.

Odigos first scans and recognizes all your running applications, then recognizes the programming language of each one and auto-instruments it accordingly, using eBPF and OpenTelemetry. In addition, it deploys collectors that buffer, filter, and deliver data to your chosen monitoring tool, and auto scales them according to the amount of traffic. This automation allows developers to enjoy distributed traces within minutes as opposed to manual effort which can take months to implement.

Automatic instrumentation across programming languages is not a trivial task, especially when dealing with static binaries (like the ones produced by the Go compiler). We built multiple mechanisms to make sure we inject the relevant headers in a secure and stable way. We developed a system that tracks functions and structs across different versions of open-source libraries. In addition, we developed a system that performs userspace memory management in eBPF. As a result, Odigos is the only solution that is able to automatically generate distributed traces for compiled languages like Go and Rust. While other solutions require users to be experts in OpenTelemetry or eBPF, our solution does not require prior knowledge of observability technologies.

Our solution can be installed on any Kubernetes cluster by executing a single command. Once installed, we detect the programming language of every running application and apply the relevant instrumentation. For JIT languages (Java and .NET) or interpreted languages (JavaScript and Python) we deploy OpenTelemetry instrumentation. For compiled languages (Go, Rust, C) we deploy our eBPF-based instrumentation. All of this is abstracted from the user, who only has to: (1) select any or all of their target applications and (2) select a backend to send monitoring data to.

In May 2022, we released our first open-source project: automatic instrumentation for Go applications, based on eBPF. We later donated this project to the OpenTelemetry community and it is currently being developed as part of the Go Automatic Instrumentation SIG.

We are big believers in open standards, therefore the instrumentation and collectors used by Odigos are all based on open-source projects developed by the OpenTelemetry community. This also enables us to be vendor-agnostic.

Currently we are focused on building our open-source project. There are no pricing or paid features yet, but in the future, we are planning to offer a managed version of Odigos that will include enterprise features.

If you're interested to learn more, check out our docs (https://docs.odigos.io), watch a demo video (https://www.youtube.com/watch?v=9d36AmVtuGU), and visit our website (https://odigos.io).

We’d love to hear your experiences with tracing and monitoring distributed applications and anything else you’d like to share!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov