Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

Read Post

Hi HN, I’m the solo founder behind SentienceAPI. I’ve spent the last December building a browser automation runtime designed specifically for LLM agents.

The Problem: Building reliable web agents is painful. You essentially have two bad choices:

Raw DOM: Dumping document.body.innerHTML is cheap/fast but overwhelms the context window (100k+ tokens) and lacks spatial context (agents try to click hidden or off-screen elements).

Vision Models (GPT-4o): Sending screenshots is robust but slow (3-10s latency) and expensive (~$0.01/step). Worse, they often hallucinate coordinates, missing buttons by 10 pixels. The Solution: Semantic Geometry Sentience is a "Visual Cortex" for agents. It sits between the browser and your LLM, turning noisy websites into clean, ranked, coordinate-aware JSON.

How it works (The Stack):

Client (WASM): A Chrome Extension injects a Rust/WASM module that prunes 95% of the DOM (scripts, tracking pixels, invisible wrappers) directly in the browser process. It handles Shadow DOM, nested iframes ("Frame Stitching"), and computed styles (visibility/z-index) in <50ms.

Gateway (Rust/Axum): The pruned tree is sent to a Rust gateway that applies heuristic importance scoring with simple visual cues (e.g. is_primary)

Brain (ONNX): A server-side ML layer (running ms-marco-MiniLM via ort) semantically re-ranks the elements based on the user’s goal (e.g., "Search for shoes").

Result: Your agent gets a list of the Top 50 most relevant interactable elements with exact (x,y) coordinates with importance value and visual cues, helping LLM agent make decision.

Performance:

Cost: ~$0.001 per step (vs. $0.01+ for Vision)

Latency: ~400ms (vs. 5s+ for Vision)

Payload: ~1400 tokens (vs. 100k for Raw HTML)

Developer Experience (The "Cool" Stuff): I hated debugging text logs, so I built Sentience Studio, a "Time-Travel Debugger." It records every step (DOM snapshot + Screenshot) into a .jsonl trace. You can scrub through the timeline like a video editor to see exactly what the agent saw vs. what it hallucinated.

Links:

Docs & SDK: https://www.sentienceapi.com/docs

GitHub (SDK): SDK Python: https://github.com/SentienceAPI/sentience-python

SDK TypeScript: https://github.com/SentienceAPI/sentience-ts

Studio Demo: https://www.sentienceapi.com/docs/studio

Build Web Agent: https://www.sentienceapi.com/docs/sdk/agent-quick-start

Screenshots with importance labels (gold stars): https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

I’m handling the backend in Rust and the SDKs in Python/TypeScript. The project is now in beta launch, I would love feedbacks on the architecture or the ranking logic!

Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

Get Top 5 Posts of the Week