Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

Hi HN, I’m the solo founder behind SentienceAPI. I’ve spent the last December building a browser automation runtime designed specifically for LLM agents.

The Problem: Building reliable web agents is painful. You essentially have two bad choices:

Raw DOM: Dumping document.body.innerHTML is cheap/fast but overwhelms the context window (100k+ tokens) and lacks spatial context (agents try to click hidden or off-screen elements).

Vision Models (GPT-4o): Sending screenshots is robust but slow (3-10s latency) and expensive (~$0.01/step). Worse, they often hallucinate coordinates, missing buttons by 10 pixels. The Solution: Semantic Geometry Sentience is a "Visual Cortex" for agents. It sits between the browser and your LLM, turning noisy websites into clean, ranked, coordinate-aware JSON.

How it works (The Stack):

Client (WASM): A Chrome Extension injects a Rust/WASM module that prunes 95% of the DOM (scripts, tracking pixels, invisible wrappers) directly in the browser process. It handles Shadow DOM, nested iframes ("Frame Stitching"), and computed styles (visibility/z-index) in <50ms.

Gateway (Rust/Axum): The pruned tree is sent to a Rust gateway that applies heuristic importance scoring with simple visual cues (e.g. is_primary)

Brain (ONNX): A server-side ML layer (running ms-marco-MiniLM via ort) semantically re-ranks the elements based on the user’s goal (e.g., "Search for shoes").

Result: Your agent gets a list of the Top 50 most relevant interactable elements with exact (x,y) coordinates with importance value and visual cues, helping LLM agent make decision.

Performance:

Cost: ~$0.001 per step (vs. $0.01+ for Vision)

Latency: ~400ms (vs. 5s+ for Vision)

Payload: ~1400 tokens (vs. 100k for Raw HTML)

Developer Experience (The "Cool" Stuff): I hated debugging text logs, so I built Sentience Studio, a "Time-Travel Debugger." It records every step (DOM snapshot + Screenshot) into a .jsonl trace. You can scrub through the timeline like a video editor to see exactly what the agent saw vs. what it hallucinated.

Links:

Docs & SDK: https://www.sentienceapi.com/docs

GitHub (SDK): SDK Python: https://github.com/SentienceAPI/sentience-python

SDK TypeScript: https://github.com/SentienceAPI/sentience-ts

Studio Demo: https://www.sentienceapi.com/docs/studio

Build Web Agent: https://www.sentienceapi.com/docs/sdk/agent-quick-start

Screenshots with importance labels (gold stars): https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

I’m handling the backend in Rust and the SDKs in Python/TypeScript. The project is now in beta launch, I would love feedbacks on the architecture or the ranking logic!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2025 best of 2024 yc w26 yc s25 yc w25 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov