ML models tend to perform poorly when presented with new and previously unseen cases as well as their performance deteriorates over time due to evolving real-world environments, which can lead to the degradation of business metrics. In fact, one of our customers (a social media platform with 150 million MAU) was tired of discovering model issues via customer complaints (and increased churn) and wanted an observability solution to identify them proactively.
UpTrain monitors the difference between the dataset the model was trained on and the real-world data it encounters during production (the wild!). This "difference" can be custom statistical measures designed by ML practitioners based on their use case. That last point regarding customization is important because, in most cases, there’s no “ground truth” to check if a model’s output is correct or not. Instead, you need to use statistical measures to figure out drift or performance degradation issues, and those require domain expertise and differ from case to case. For example, in a text summarization model, you want to monitor drift in the input text sentiment, but for a human pose estimation model, you want to add integrity checks on the predicted body length.
Additionally, we monitor for edge cases defined as rule-based smart signals on the model input. Whenever UpTrain sees a distribution shift or an increased frequency of edge cases, it raises an alert while identifying the subset of data that experienced these issues. Finally, it retrains the model on that data, improving its performance in the wild.
Before UpTrain, we explored many observability tools at previous companies (Bytedance, Meta, and Bosch), but always got stuck figuring out what issues our models were facing in production. We used to go through user reviews, find patterns around model failures and manually retrain our models. This was time-consuming and opaque. Customizing our monitoring metrics and having a solution built specifically for ML models was a big need that wasn’t fulfilled.
Additionally, many ML models operate on user-sensitive data, and we didn’t want to send users’ private data to third parties. From a privacy perspective, relying on third-party hosted solutions just felt wrong, and motivated us to create an open-source self-hosted alternative for the same.
We are building UpTrain to make model monitoring effortless. With a single-line integration, our toolkit allows you to detect dips in model performance using real-time dashboards, sends you Slack alerts, helps to pinpoint poor-performing cohorts, and many more. UpTrain is built specifically for ML use cases, providing tools to monitor data distribution shifts, identify production data points with low representation in training data, and visualization/drift detection for embeddings. For more about our key features, see https://docs.uptrain.ai/docs/key-features
Our tool is available as a Python package that can be installed on top of your deployment infrastructure (AWS, GCP, Azure). Since ML models operate on user-sensitive data, and sharing it with external servers is often a barrier to using third-party tools, we focus on deploying to your own cloud.
We’ve launched this repo under an Apache 2.0 license to make it easy for individual developers to integrate it into their production app. For monetization, we plan to build enterprise-level integrations that will include managed service and support. In the next few months, we plan to add more advanced observability measures for large language models and generative AI, as well as make UpTrain easier to integrate with other tools like Weights and Biases, Databricks, Kubernetes, and Airflow.
We would love for you to try out our GitHub repo and give your feedback, and we look forward to all of your comments!