TERX — browser agent memory

// real benchmark — Groq API, openai/gpt-oss-120b, measured

182.7x

average speedup on warm runs

100%

token savings on cache hits

10/10

cache hit rate across task suite

task	llm steps	agent (cold)	terx (warm)	speedup	tokens	cost saved	hit
User Login Flow	4	2.93s · $0.0076	0.078s · $0.000	37.7x	2,090	100%	✓
Search and Filter	5	3.99s · $0.0136	0.101s · $0.000	39.7x	3,533	100%	✓
Multi-step Signup	2	1.54s · $0.0045	0.062s · $0.000	25.0x	1,035	100%	✓
E-commerce Product	5	25.97s · $0.0161	0.091s · $0.000	286x	3,811	100%	✓
Settings Toggles	4	16.01s · $0.0095	0.103s · $0.000	155.9x	2,451	100%	✓
Data Table Pagination	12	90.84s · $0.0567	0.259s · $0.000	350.4x	12,479	100%	✓
Support Ticket	4	15.40s · $0.0078	0.101s · $0.000	152.2x	2,135	100%	✓
Fuzzy Search	3	11.85s · $0.0062	0.089s · $0.000	132.9x	1,459	100%	✓
Profile Update	3	12.35s · $0.0086	0.094s · $0.000	131.2x	1,854	100%	✓
Complex Form	4	17.74s · $0.0082	0.110s · $0.000	161.5x	2,146	100%	✓
total / average	—	198.62s · $0.1388	1.087s · $0.000	182.7x	32,993	100%	10/10

Reproduce: GROQ_API_KEY=... python -m terx.benchmarks.real_agent

// quickstart

MCP server

Python library

Run benchmark

Works with Claude Desktop, Cursor, Windsurf

Start Chrome with debugging enabled:

# close all Chrome windows first
google-chrome --remote-debugging-port=9222 --no-first-run

Start TERX:

pip install terx
terx-server

Add to your MCP config:

// claude_desktop_config.json
{
  "mcpServers": {
    "terx": { "command": "terx-server" }
  }
}

Every browser task your agent runs is now cached. First run = normal. Every repeat = free.

Wrap your existing agent — 3 lines

from terx.cdp.session import BrowserSession
from terx.cache.cache import MemoryCache, session_for

cache = MemoryCache()

async with BrowserSession() as session:
    bridge = session.bridge()

    async with session_for(cache, bridge, "login to salesforce") as ctx:
        if ctx.hit:
            await ctx.replay()          # 0 tokens, ~80ms
        else:
            await your_agent.run(...)   # agent runs, TERX records

Run the real LLM benchmark yourself

# free key at console.groq.com
pip install "terx[benchmark]"
cp .env.example .env   # fill in GROQ_API_KEY
python -m terx.benchmarks.real_agent

Runs 10 real tasks. Prints the table above with your measured numbers.

// how it works

  ┌─────────────────────────────────────────────────┐
  │                  your agent                      │
  │          (browser-use / Claude / anything)        │
  └───────────────────────┬─────────────────────────┘
                          │
  ┌───────────────────────▼─────────────────────────┐
  │              TERX session_for()                │
  │  on miss → records CDP commands                  │
  │  on hit  → replays them directly                 │
  └───────────────────────┬─────────────────────────┘
                          │ raw WebSocket
  ┌───────────────────────▼─────────────────────────┐
  │                  Chrome                          │
  │           (remote debugging port 9222)            │
  └─────────────────────────────────────────────────┘

CDP Bridge

Raw asyncio WebSocket to Chrome. No Playwright subprocess. <50ms startup, ~2MB RAM.

DOM Extractor

Reads Chrome's Accessibility Tree (not raw HTML). Computes a fuzzy structural hash — survives CSS changes and A/B tests without breaking cache hits.

Muscle Memory Cache

SQLite. On success: stores CDP command sequence keyed by (domain, dom_hash, task). On future runs: replays directly. INSERT OR IGNORE — first successful recording is canonical.

Dynamic node ID translation

Chrome assigns new backendNodeIds each session. TERX re-snapshots on replay and maps old IDs to current equivalents by role + label matching.

// mcp tools

tool	what it does
`browser_get_state`	AX tree snapshot — stable element IDs, no hallucination-prone HTML
`browser_navigate`	Navigate to URL (scheme-validated, blocks javascript: data: file:)
`browser_click`	Click element by stable ID
`browser_type`	Type into input — fires native setter (React/Vue/Svelte safe)
`browser_screenshot`	Returns hash ref, not base64 — no context window poisoning
`browser_scroll`	Scroll up/down
`browser_new_tab`	Open new tab
`cache_stats`	Hit rate, total savings, unique domains cached
`cache_invalidate`	Clear cache for a domain when the UI ships a redesign

// memory layer for browser agents