Andi (YC W22) – Q&A based, ad-free, anti-spam search engine

Read Post

Hi HN, we're Angie and Jed, and we're building Andi (https://andisearch.com), a new type of search engine with an AI assistant that answers complex questions, and gives you tools to fight spam and ad tech.

There has been a lot of discussion on HN recently about how Google search is dying. If you do a Google search in a category like finance or health, the results are overwhelmed with spam, clickbait and ads. That's what we're working to fix.

For us, the problem is personal. I'm a programmer who also trained and worked as a journalist and built a media SaaS startup. I watched first-hand as the media industry not-so-slowly starved, my startup failed, and my friends lost jobs and businesses. Google took all the revenue as media turned to trash, and ad tech, clickbait and content marketing took over the Internet, and made using the web awful.

With Andi, we already apply spam blacklists server-side and we're adding tools to blacklist and report spam locally. Andi's free from ads and tracking. And it protects your privacy more than other search engines, because searches don't pass through browser history (they're encrypted POST requests). It's free and anonymous, and there are no usage limits. You don't need to register, or install an extension or app.

Andi is not just another copy of Google. The UX is radically different, like messaging with a smart friend who answers questions and sends you useful links. It shows results in a cleaner, more visual way (or you can change view to a simple list). You can preview content from the web safely using a proxied Reader View with no ads or clutter.

Our AI assistant uses a conversational interface to answer complex questions, explain topics, and find key information. We call these "deep answers". It is a significant break-through, as you'll see if you try it out for yourself. Try something factual and current, like "How many Ukrainian refugees will the US accept, and what humanitarian aid will it provide?", "What were the demands of the cybercriminals who breached Nvidia?", or "Why is elon musk considering creating a new social media platform?"

We're doing much more here than GPT-based text generation. You've probably seen examples from GPT writing assistants that look impressive but make no sense. Large language models on their own generate plausible-sounding text that is often plain wrong or dangerous. That's because they predict the next word in a sequence based on training data. They have no understanding of factual correctness, or moral right or wrong. They're like human linguistic intuitive perception.

Our approach works more like humans do, combining large language models (both commercial and open source gpt-based models) with reasoning (directed logic and classifiers) and common sense (heuristics). We answer many questions using APIs or knowledge graphs, or quoting extracted text. When the question is appropriate for complex question answering, we use the new approach. It works by finding the best sources, and extracting the content with the relevant facts. We then combine GPT-based models with the results to compose a conversational answer that is also factually correct, presented alongside the full search results.

The way Andi searches is also different. We use classifiers and NLP to understand question intent, entities and topics, and predict the best sources for an answer. Then we query APIs and vertical searches directly, and retrieve content in real-time, before ranking and filtering the results. The content you see in search results is retrieved directly from each site in real-time.

When we can't find good results, we fall back as an agent to legacy web search (Google, Bing and others - about 50% of the time now). Andi does best with natural language queries. We've trained classifiers for content quality and spam detection, and blacklist and downrank known bad sources and copycat sites (for non-political content). You can disable these.

The stack is a serverless application hosted on AWS, using Lambda and Kubernetes, with inference moving to Sagemaker to improve speed. We use PyTorch, SpaCy, GPT-based models (GPT-2, GPT-J/NeoX and commercial providers) and HuggingFace, BERT-style transformers, plus AWS Lex for some initial intent routing. Classifiers are trained on custom-labeled public search data and content examples. We have a database of 30k+ top sites. We're building some custom vertical searches. Services are written in Python and Node. The front end is a Progressive Web App written in React.

Some fun features to try include recent events ("Why was the James Webb telescope launched from French Guiana?"), direct navigation ("go hn google search dying" and allow pop-ups), or question answering ("what is the gdp per capita of china vs new zealand"). You can also "Change View" on results between a visual feed, grid of cards, or simple list, or even like Hacker News or early Google. Also try View in Reader for a proxied ad-free way to read articles, including many behind paywalls like the NYT or Economist.

Andi is a fairly stable alpha and still experimental (it sometimes misunderstands or gets things wrong). We plan to have a freemium model with some paid features and API use. We're a small team with two full-time founders and some help from friends. We've been live for a few weeks, and we're iterating fast based on feedback. We'd love to hear what you think about search and how to fix it, and answer any questions you have about what we're making.

Andi (YC W22) – Q&A based, ad-free, anti-spam search engine

Get Top 5 Posts of the Week