Our founding team comes from Optimizely & Google where we built similar predictive tools for our marketing teams. At each company, we kept building the same components of a predictive pipeline - javascript snippets to collect data, ETL jobs to transform that data, and cron jobs to run a regression. We were spending hours a week maintaining these pipelines, but the time-consuming part wasn’t the algorithms (as they’re open sourced) it was the transformations.
So with ClearBrain we decided to automate the data transformation steps. We built our system in Spark ML (scala), Data Pipeline, and Go. Instead of instrumenting yet another Javascript snippet, we use existing data in Segment (YC S11) and Heap (YC W13) through standard integrations. And because every Segment/Heap dataset has the same schema, our system can process it with the same transformations into a machine-readable feature matrix. When a customer selects a user action tracked in Segment/Heap to predict, our transformed matrix is run through a logistic regression via Spark ML, and outputs a probabilistic score for each user to perform that action based on users who performed it in the past.
This distills the predictive modeling process to a simple UI to identify high-probability users in minutes. We’ve built the tool with marketers in mind, to help them identify which users may convert or churn, and export those users to marketing tools like Facebook Ads, Hubspot, etc. We’ve also found good reception from startups that have marketing objectives but lack the resources to deploy ML-driven campaigns themselves.
We look forward to feedback from the HN community! :)
Bilal