We’ve found that most technical searches fall into a few categories: ad-hoc how-tos, understanding an API, recalling forgotten details, research, or troubleshooting. Google is too broad and shallow of a search tool to be good at this. Even after sifting through the deluge of spammy, irrelevant sites pumped full of SEO, you still have to manually find your answer through discussion boards or documentation. Their “featured snippet” approach works for simple factoid queries but quickly falls apart if a question requires reasoning about information across multiple webpages.
Our approach is narrow and deep — to retrieve detailed information for topics relevant to developers. When you submit a query, we pull raw site data from Bing, rerank them, and extract understanding and code snippets with our proprietary large language models. We use seq-to-seq transformer models to generate a final explanation from all of this input.
For our honors theses at UT Austin, we researched prototypes of large generative language models that can answer complex questions by combining information from multiple sources. We found that GPT-3, GPT-Neo/J/X, and similar autoregressive language models that predict text from left to right are prone to “hallucinating” and generating text inconsistent with the “ground truth” document. Training a sequence-to-sequence language model (T5 derivative) on our custom dataset designed for factual generation yielded much better results with less hallucination.
After creating this prototype, we started actively developing Hello with the idea that searching should be just like talking to a smart friend. We want to build an engine that explains complex topics clearly and concisely, and lets users ask follow-up questions using the context of their previous searches.
For example, when asked “what type of semaphore can function as a mutex?”, Hello pulls in the raw text from all five search results linked on the search page to generate: “A binary semaphore can be used as a mutex. Mutexes and semaphores are two different types of synchronization mechanisms. A mutex is a lock that prevents two threads from accessing the same resource at the same time. A semaphore is used to signal that a resource has become available.” We're biased, of course, but we think that the ability to reason abstractly about information from multiple web pages is a cool thing in a search engine!
We use BERT-based models to extract and rank code snippets if relevant to the query. Our search engine currently does well at answering applicable how-to questions such as “Sort a list of tuples by the second element”, “Set a response cookie in FastAPI”, “Get value of input in React”, “How to implement Dijkstra's algorithm.” Exclusively using our own models has also freed us from dependence on OpenAI.
Hello is and will always be free for individual devs. We haven’t rolled out any paid plans yet, but we’re planning to charge teams per user/month to use on internal data scattered around in wikis, documentation, slack, and emails.
We started Hello Cognition to scratch our own itch, but now we hope to improve the state of information retrieval for the greater developer community. If you'd like to be part of our product feedback and iteration process, we'd love to have you—please contact us at [email protected].
We're looking forward to hearing your ideas, feedback, comments, and what would be helpful for you when navigating technical problems!