Castled Data (YC W22) – Open-Source Reverse ETL

Hi HN, We're Arun, Manish, Abhilash and Franklin from Castled Data (https://castled.io). Castled is an open source reverse ETL solution. It helps you to periodically sync the data in your database/warehouse (Snowflake, BigQuery, Redshift, etc.) into sales, marketing, or support apps (Salesforce, Hubspot, Intercom etc.), or custom software, without needing an engineering team. Here’s a demo video: https://www.loom.com/share/71bf33acbb4a41cab7c96a3460a84e5f.

On an average, mid-scale organizations use around 40 SaaS apps. These are powerful in functionality, but limited by the quality of the product/customer data which is fed into them. The data getting synced into these tools is often incomplete, suffers from quality issues, and requires unreliable and manual imports (e.g. from CSV).

Manish and I were founding engineers at Hevodata, an ETL company, when it went from 5 customers to around 300 customers. We started seeing the trend of more and more customers wanting to move the data out of their cloud data warehouse to feed their business tools. We built a prototype to solve this for our users, but when we went deep into their use cases, we found that there were a lot of unsolved problems in this space. We also realized that activating warehouse data reliably for operational purposes was emerging as the next big trend for data-driven companies.

We did some research and came across Census/Hightouch, which were early-stage Reverse ETL cloud solutions at the time. But from our previous experience working in the ETL space, we believed that any data pipeline solution needs to be open source to cover the long list of connectors that needs to be built. So we set out to build our open source Reverse ETL solution.

With Castled, companies can create automated data pipelines to periodically sync the output of a warehouse transformation query or dbt models(on the works) to their sales, marketing, support and notification tools. We fetch only the incremental results by default on every pipeline run, which makes sure that rate limits and other constraints of the destination APIs are not breached. Our users can also set a time schedule to define the frequency of the pipeline run.

The technical challenges in building such a tool include: doing CDC (Change Data Capture) from data warehouses which do not provide a typical write ahead log; handling rate limits on destination APIs; handling deduplication of records on destination objects; failure handling and automatic retries. But the biggest challenge is the sheer number of destination app integrations that need to be supported—we are talking about tens of thousands of connectors.

Our major differentiator from Census/Hightouch is that we are open source. Our users can host Castled in their own private cloud and start operationalizing their data for free. We’ve observed that initially customers are inclined towards buying a cloud solution for their data integration needs. But once they scale up, they realize that their cloud vendor is unable to cope with the increasing number of apps getting used in the organization. They soon start building in-house data pipeline solutions or look for an open-source solution to solve their problems. Being open source, we provide the flexibility for our customers to build their own connectors rather than waiting for cloud vendors to fulfill their connector requests.

Compared with open-source alternatives (e.g. Grouparoo), we have built Castled in such a way that our community can build new connectors in a few hours. One example of this is our Castled Form Language (CFL), which helps our users auto generate extremely complex forms on the UI by writing a few Java annotations on the backend. This removes the need for a UI developer to build a new connector.

We have our Github repo here : https://github.com/castledio/castled. For most users, you can spin up the application on your desktop in a few minutes. In case you want a hosted solution, we also have our cloud platform hosted at https://castled.io. We have a subscription based hosted cloud solution, which provides more security features like single sign on, authentication, user management, notification, alerts, etc. you can sign up for and try out the product for free, no credit card required.

This is the first time we are trying to build an open-source community around a project and we're excited to hear any thoughts, insights, questions, encouragement and concerns in the comments below! Also we will be monitoring the thread over the course of today to answer any questions. Also feel free to reach out to me by email at [email protected]



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov