Companies spend countless engineering hours manually transforming data for custom integrations, or pay large amounts to consulting firms to do it for them. Engineers have to work through massive data schemas and create hacky scripts to transform data. Dynamic schemas from different clients or apps require custom integration pipelines. Many non-tech companies are even still relying on schemas from csv and pdf file formats. Days, weeks, and even months are spent just building integrations.
We ran into this problem first-hand as engineers: Nebyou during his time as a ML engineer at Opendoor, where he spent months manually creating data transformations, while Nicolas did the same at his time working at Apple Health. Talking to other engineers, we learned this problem was everywhere. Because of the dynamic and one-off nature of different data integrations, it has been a challenging problem to automate. We believe that with recent improvements in LLMs (large language models), automation has become feasible and now is the right time to tackle it.
Lume solves this problem head-on by generating data transformations, which makes the integration process 10x faster. This is provided through a self-serve managed platform where engineers can manage and create new data integrations.
How it works: users can specify their data source and data destination, both of which specify the desired data formats, a.k.a. schemas. Data source and destinations can be specified through our 300+ app connectors, or custom data schemas can be connected by either providing access to your data warehouse, or a manual file upload (csv, json, etc) of your end schema. Lume, which includes AI and rule-based models, creates the desired transformation under the hood by drafting the necessary SQL code, and deploys it to your destination.
At the same time, engineers don’t want to rely on low- or no-code tools without visibility under the hood. Thus, we also provide features to ensure visibility, confidence, and editability of each integration: Data Preview allows you to view samples of the transformed data, SQL Editor allows you to see the SQL used to create the transformation and to change the assumptions made my Lume’s model, if needed (most of the time, you don’t!). In addition, Lineage Graph (launching soon) shows you the dependencies of your new integration, giving more visibility for maintenance.
Our clients have two primary use cases. One common use case is to transform data source(s) into one unified ontology. For example, you can create a unified schema between Salesforce, Hubspot, Quickbooks, and Pipedrive in your data warehouse. Another common use case is to create data integrations between external apps, such as custom syncs between your SaaS apps. For example, you can create an integration directly between your CRM and BI tools.
The most important thing about our solution is our generative system: our model ingests and understands your schemas, and uses that to generate transformations that map one schema to another. Other integration tools, such as Mulesoft and Informatica, ask users to manually map columns between schemas—which takes a long time. Data transformation tools such as dbt have improved the data engineering process significantly (we love dbt!) but still require extensive manual work to understand the data and to program. We abstract all of this and do all the transformations for our customers under the hood - which reduces the time taken to manually map and engineer these integrations from days/weeks to minutes. Our solution handles the truly dynamic nature of data integrations.
We don’t have a public self-serve option yet (sorry!) because we’re at the early stage of working closely with specific customers to get their use cases into production. If you’re interested in becoming one of those, we’d love to hear from you at https://lume.ai. Once the core feature set has stabilized, we’ll build out the public product. In the meantime, our demo video shows it in action: https://www.loom.com/share/bed137eb38884270a2619c71cebc1213.
We currently charge a flat monthly fee that varies based on the quantity of data integrations. In the future, we plan on having more transparent pricing that’s made up of a fixed platform fee + compute-based charges. To not have surprise charges, we currently run the compute in your data warehouse.
We’re looking forward to hearing any of your comments, questions, ideas, experiences, and feedback!