TERX records what it did the first time. Every run after that replays in milliseconds — no LLM, no tokens, no waiting.
| task | llm steps | agent (cold) | terx (warm) | speedup | tokens | cost saved | hit |
|---|---|---|---|---|---|---|---|
| User Login Flow | 4 | 2.93s · $0.0076 | 0.078s · $0.000 | 37.7x | 2,090 | 100% | ✓ |
| Search and Filter | 5 | 3.99s · $0.0136 | 0.101s · $0.000 | 39.7x | 3,533 | 100% | ✓ |
| Multi-step Signup | 2 | 1.54s · $0.0045 | 0.062s · $0.000 | 25.0x | 1,035 | 100% | ✓ |
| E-commerce Product | 5 | 25.97s · $0.0161 | 0.091s · $0.000 | 286x | 3,811 | 100% | ✓ |
| Settings Toggles | 4 | 16.01s · $0.0095 | 0.103s · $0.000 | 155.9x | 2,451 | 100% | ✓ |
| Data Table Pagination | 12 | 90.84s · $0.0567 | 0.259s · $0.000 | 350.4x | 12,479 | 100% | ✓ |
| Support Ticket | 4 | 15.40s · $0.0078 | 0.101s · $0.000 | 152.2x | 2,135 | 100% | ✓ |
| Fuzzy Search | 3 | 11.85s · $0.0062 | 0.089s · $0.000 | 132.9x | 1,459 | 100% | ✓ |
| Profile Update | 3 | 12.35s · $0.0086 | 0.094s · $0.000 | 131.2x | 1,854 | 100% | ✓ |
| Complex Form | 4 | 17.74s · $0.0082 | 0.110s · $0.000 | 161.5x | 2,146 | 100% | ✓ |
| total / average | — | 198.62s · $0.1388 | 1.087s · $0.000 | 182.7x | 32,993 | 100% | 10/10 |
Reproduce: GROQ_API_KEY=... python -m terx.benchmarks.real_agent
Start Chrome with debugging enabled:
# close all Chrome windows first
google-chrome --remote-debugging-port=9222 --no-first-run
Start TERX:
pip install terx
terx-server
Add to your MCP config:
// claude_desktop_config.json
{
"mcpServers": {
"terx": { "command": "terx-server" }
}
}
Every browser task your agent runs is now cached. First run = normal. Every repeat = free.
from terx.cdp.session import BrowserSession
from terx.cache.cache import MemoryCache, session_for
cache = MemoryCache()
async with BrowserSession() as session:
bridge = session.bridge()
async with session_for(cache, bridge, "login to salesforce") as ctx:
if ctx.hit:
await ctx.replay() # 0 tokens, ~80ms
else:
await your_agent.run(...) # agent runs, TERX records
# free key at console.groq.com
pip install "terx[benchmark]"
cp .env.example .env # fill in GROQ_API_KEY
python -m terx.benchmarks.real_agent
Runs 10 real tasks. Prints the table above with your measured numbers.
┌─────────────────────────────────────────────────┐
│ your agent │
│ (browser-use / Claude / anything) │
└───────────────────────┬─────────────────────────┘
│
┌───────────────────────▼─────────────────────────┐
│ TERX session_for() │
│ on miss → records CDP commands │
│ on hit → replays them directly │
└───────────────────────┬─────────────────────────┘
│ raw WebSocket
┌───────────────────────▼─────────────────────────┐
│ Chrome │
│ (remote debugging port 9222) │
└─────────────────────────────────────────────────┘
Raw asyncio WebSocket to Chrome. No Playwright subprocess. <50ms startup, ~2MB RAM.
Reads Chrome's Accessibility Tree (not raw HTML). Computes a fuzzy structural hash — survives CSS changes and A/B tests without breaking cache hits.
SQLite. On success: stores CDP command sequence keyed by (domain, dom_hash, task). On future runs: replays directly. INSERT OR IGNORE — first successful recording is canonical.
Chrome assigns new backendNodeIds each session. TERX re-snapshots on replay and maps old IDs to current equivalents by role + label matching.
| tool | what it does |
|---|---|
browser_get_state | AX tree snapshot — stable element IDs, no hallucination-prone HTML |
browser_navigate | Navigate to URL (scheme-validated, blocks javascript: data: file:) |
browser_click | Click element by stable ID |
browser_type | Type into input — fires native setter (React/Vue/Svelte safe) |
browser_screenshot | Returns hash ref, not base64 — no context window poisoning |
browser_scroll | Scroll up/down |
browser_new_tab | Open new tab |
cache_stats | Hit rate, total savings, unique domains cached |
cache_invalidate | Clear cache for a domain when the UI ships a redesign |