Credal.ai (YC W23) – Data Safety for Enterprise AI

Read Post

Hi Hacker News! We’re Ravin and Jack, the founders of Credal.ai (https://www.credal.ai/). We provide a Chat UI and APIs that enforce PII redaction, audit logging, and data access controls for companies that want to use LLMs with their corporate data from Google Docs, Slack, or Confluence. There’s a demo video here: https://www.loom.com/share/2b5409fd64464dc9b5b6277f2be4e90f?....

One big thing enterprises and businesses are worried about with LLMs is “what’s happening to my data”? The way we see it, there are three big security and privacy barriers companies need to solve:

1. Controlling what data goes to whom: the basic stuff is just putting controls in place around customer and employee PII, but it can get trickier when you also want to be putting controls in place around business secrets, so companies can ensure the Coca Cola recipe doesn’t accidentally leave the company.

2. Visibility: Enterprise IT wants to know exactly what data was shared by whom, when, at what time, and what the model responded with (not to mention how much the request cost!). Each provider gives you a piece of the puzzle in their dashboard, but getting all this visibility per request from either of the main providers currently requires writing code yourself.

3. Access Controls: Enterprises have lots of documents that for whatever reason cannot be shared internally to everyone. So how do I make sure employees can use AI with this stuff, without compromising the sensitivity of the data?

Typically this pain is something that is felt most acutely by Enterprise IT, but also of course by the developers and business people who get told not to build the great stuff they can envision. We think it’s critical to solve these issues since the more visibility and control we can give Enterprise IT about how data is used, the more we can actually build on top of these APIs and start applying some of the awesome capabilities of the foundation models across every business problem.

You can easily grab data from sources like Google Docs via their APIs, but for production use cases, you have to respect the permissions on each Google Doc, Confluence Page, Slack channel etc. This gets tricky when these systems combine some permissions defined totally inside their product, with permissions that are inherited from the company’s SSO provider (often Okta or Azure AD). Respecting all these permissions becomes both hard and vital as the number of employees and tools accessing the data grows.

The current state of the art is to use a vector database like Pinecone, Milvus, or Chroma, integrate your internal data with those systems, and then when a user asks a question, dynamically figure out which bits are relevant to the user’s question and send those to the AI as part of the prompt. We handle all this automatically for you (using Milvus for now, which we host ourselves), including the point and click connectors for your data (Google Docs/Sheets, Slack, Confluence with many more coming soon). You can use that data through our UI already and we’re in the process of adding this search functionality to the API as well.

There’s other schlep work that devs would rather not worry about: building out request level audit logs, staying on top of the rapidly changing API formats from these providers, implementing failover for when these heavily overburdened APIs go down etc, We think individual devs should not have to do these themselves, but the foundation model providers are unlikely to provide consistent, customer centric approaches for them. The PII detection piece in some ways is the easiest - there are a lot of good open source models for doing this, and companies using Azure OpenAI and AWS Bedrock seem less concerned with it anyway. We expect that the emphasis companies place on the redactions we provide may actually go down over time, while the emphasis on unified, consistent audit logging and data access controls will increase.

Right now we have three plans: a free tier (which is admittedly very limited but intended to give you a feel for the product), the business plan which starts at $500pm which gets you access to the data integration as well as the most powerful models like GPT 4 32k, Anthropic 100k etc, and an enterprise plan which starts at $5000pm, which is a scaled up version of the business tier and lets you go on-prem (more details on each plan are on the website). You can try the free tier self-serve, but we haven’t yet built out fully self service onboarding for the paid plans so for now it is a “book a meeting” button, apologies! (But it only takes 5 minutes and if you want it, we can fully onboard you in the meeting itself).

When Jack and I started Credal, we actually set out to solve a different problem: an ‘AI Chief of Staff’ that could read your documents and task trackers, and guide your strategic decision making. We knew that data security was going to be a critical problem for enterprises. Jack and I were both deep in the Enterprise Data Security + AI space before Credal, so we naturally took a security first approach to building out our AI Chief of Staff. But in reality, when we started showing the product to customers, we learned pretty fast that the ‘Chief of Staff’ features were at best nice to have, and the security features were what they were actually excited by. So we stripped the product back to basics, and built out the thing our customers actually needed. Since then we’ve signed a bunch of customers and thousands of users, which has been really exciting.

Now that our product is concretely helping a bunch of people at work, is SOC 2 T1 Compliant, and is ready for anyone to just walk up and use, we’re super excited to share it with the Hacker News community, which Jack and I have been avid readers of for a decade now. It’s still a very early product (the private beta opened in March), but we can’t wait to get your feedback and see how we can make it even better!

Credal.ai (YC W23) – Data Safety for Enterprise AI

Get Top 5 Posts of the Week