Airbyte (YC W20) – Open-Source ELT (Fivetran/Stitch Alternative)

Hi HN!

Michel here with John, Shrif, Jared, Charles, and Chris. We are building an open-source ELT platform that replicates data from any applications, APIs, databases, etc. into your data warehouses, data lakes or databases:

I’ve been in data engineering for 11 years. Before Airbyte, I was the head of integrations at Liveramp, where we built and scaled over 1,000 data ingestion connectors to replicate 100TB worth of data every day. John, on the other end, has already built 3 startups with 2 exits. His latest one didn’t work out, though. He spent almost a year building ETL pipelines for an engineering management platform, but he eventually ran out of money before reaching product-market fit.

By late 2019, we had known each other for 7 years, and always wanted to work together. When John’s third startup shut down, it was finally the right timing for both of us. And we knew which problem we wanted to address: data integration, and ELT more specifically.

We started interviewing Fivetran, Stitchdata, and Matillion’s customers, in order to see if the existing solutions were solving their problems. We learned they all fell short, and always with the same patterns.

Some limitations we identified are due to the fact that they are closed source. This prevents them from addressing the long tail of integrations because they will always have a ROI consideration when building and maintaining new connectors. A good example is Fivetran which, after 8 years, offers around 150 connectors. This is not a lot when you look at the number of existing tools out there (more than 10,000). In fact, all their customers that we talked to are building and maintaining their own connectors (along with orchestration, scheduling, monitoring, etc.) in-house, as the connectors they needed were either not supported in the way they needed or not supported at all.

Some of those customers also tried to leverage existing open-source solutions, but the quality of the existing connectors is inconsistent, as many haven't been updated in years. Plus, they are not usable out of the box.

That’s when we knew we wanted Airbyte to be open-source (MIT license), usable out of the box, and cover the long tail of integrations. By making it trivial to build new connectors on Airbyte in any language (they run as Docker containers), we hope the community will help us build and maintain the long tail of connectors. While open-source also enables us to address all use cases (including internal DBs and APIs), it also allows us to solve the problem inherent to cloud-based solutions: the security and privacy of your data. Companies don’t need to trust yet another 3rd-party vendor. Because it is self-hosted, it will disrupt the pricing of existing solutions.

Here’s a 2-minute demo video if you want to check out how it looks:

Airbyte can run on a single node without any external infrastructure. We also integrate with Kubernetes (alpha), and will soon integrate with Airflow so you can run replication tasks across your cluster.

Today, our early version supports about 41 sources and 6 destinations ( We’re releasing new connectors ( every week (6 of them have already been contributed by the community). We bootstrapped some connectors using the highest-quality ones from Singer. Our connectors will always remain open-source.

Our goal is to solve data integration for as many companies as possible, and the success of Airbyte is predicated on the open-source project becoming loved and ubiquitous. For this reason, we will focus the entirety of 2021 strengthening the open-source edition; we are dedicated to making it amazing for all users. We will eventually create a paid edition (open core model) with enterprise-level features (support, SLA, hosting and management, privacy compliance, role and access management, SSO, etc.) to address the needs of our most demanding users.

Give it a spin: & Let us know what you think. This is our first time building an open-source technology, so we know we have a lot to learn!

Get Top 5 Posts of the Week

best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents

andrey azimov by Andrey Azimov