Elementary (YC W22) – Open-source data observability

Hey HN! We’re Maayan and Or, and we are building Elementary (https://github.com/elementary-data/elementary), an open-source framework that continuously monitors your data and sends alerts when anomalies are detected.

Elementary can alert you, for example, when a table in Snowflake hasn't been updated as expected or when a revenue column has too many nulls. It also monitors operations in the data stack, and provides for analyzing both impact and root cause. For example, it can alert you when your dbt runs or tests fail, including the impacted dependencies. A data lineage graph visualizes the data flow and can be used to find the source of invalid data.

We have both been working in the data space for over a decade, Maayan in analytics and Or in data engineering. Despite working at very different companies with different data stacks and use cases, we had the same reliability problems. Data is incomplete and inconsistent, and the abundance of technologies creates more complexity and inconsistency. Data reliability issues cause distrust, delays, and bad decisions. It's hard to achieve high data reliability, detect issues fast, understand impact and resolve quickly.

We also found that we had built similar solutions, and as we talked to other developers, we learned that most data teams have their own version of the same thing. They usually don’t go for a commercial observability solution unless they have major incidents and technical debt. Until that point, they prefer to build for themselves, for two reasons: to avoid the overhead of procurement and security compliance; and to customize to their own stack, data sources, business logic, etc, and have all the metadata and metrics in their stack to support additional use cases.

We decided to build an open-source alternative—one that can be implemented easily, hosted yourself, and customized. This solves the compliance and the data ownership problem. It also solves the build-your-own problem, because teams can deploy an extensive solution early on, instead of waiting till later when there are major problems.

Elementary stores all the logs, metadata, and metrics it collects and generates in the data warehouse, so users can easily add their own detections and logic to it. Additionally, the solution is dbt native, which means it provides a dbt package that can be easily installed in a dbt environment as well as configured directly from a dbt project. Since it's part of the existing workflow and environment, it makes it convenient for data engineers, analytics engineers, and data analysts to enhance and contribute.

Open source eliminates the need to pay for getting started or to grant access to a third party. A managed service and additional enterprise features will be available in the cloud in the future. Critically, though, the user's metadata will continue to reside in their environment, under their control, and they will still have full customization available.

Currently Elementary supports Snowflake, BigQuery and dbt. It collects metadata such as schemas, query logs and dbt artifacts. It generates and monitors data quality metrics, sends Slack alerts, and visualizes the data lineage. If this is your data stack, we’d love you to give it a try!

We would love to hear your feedback, experiences and ideas from trying to solve data observability in your organizations.

Get Top 5 Posts of the Week

best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2022 best of 2021 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents

andrey azimov by Andrey Azimov