Agent VCR — Time-Travel Debugging for AI Agents

The Problem

Debugging agents is a nightmare.

When your LangGraph or CrewAI agent fails on step 8 out of 10, existing tools only tell you what went wrong. To fix it, you re-run all 10 steps from scratch.

Every logic error costs minutes of wall time and dollars in wasted LLM tokens. At scale, this kills iteration speed entirely.

Re-run entire chain for every small fix
Can't inspect intermediate state without breakpoints
LangSmith / LangFuse are read-only observers
No way to test prompt changes mid-chain

The Solution

Rewind. Fix. Resume.

Agent VCR records your agent's complete state at every step into a local JSONL file. When something breaks, you jump straight to the failing frame.

Edit the state — fix a bad prompt, inject corrected context, patch a tool output — then resume execution forward from that exact point.

Jump to any frame instantly
Full state snapshot at every step
Edit state and resume mid-chain
Fork runs to compare variants

Quick Start

From zero to time-travel in under a minute.

Install

terminal

pip install ai-agent-vcr

Record your agent

agent.py

from agent_vcr import VCRRecorder

recorder = VCRRecorder()
recorder.start_session("debug_run")

# Your existing agent code — unchanged
result = my_agent.run(query)

recorder.save()  # → .vcr/debug_run.vcr

Time-travel & fix

debug.py

from agent_vcr import VCRPlayer

player = VCRPlayer.load(".vcr/debug_run.vcr")

# Jump to the failing step
state = player.goto_frame(7)
print(state)          # Inspect full state

# Fix the bad state
state["prompt"] = "Corrected prompt"

# Resume from step 7 forward
player.resume(agent_callable, from_frame=7)

Everything you need to debug agents.

Built for the reality of multi-step agentic systems.

⏮

Time Travel

Jump to any frame in a session instantly. Full input/output state snapshot at every node.

✏

State Injection

Mutate the state at any frame — fix a prompt, patch tool output, inject context — then resume.

🌿

Session Forking

Fork from any frame to create parallel runs. Compare how different fixes change downstream behavior.

📡

Live WebSocket Feed

Stream agent execution in real-time via the built-in FastAPI server. Watch every step as it happens.

🗂

JSONL Storage

Sessions stored as plain JSONL. Human-readable, git-diffable, append-only, parseable line-by-line.

⚡

<5ms Overhead

P99 recording latency under 5ms. Benchmarked continuously in CI. Safe for production use.

🔄

Async Native

Full AsyncVCRRecorder and AsyncVCRPlayer. Zero blocking I/O, built for modern asyncio agents.

🖥

Terminal TUI

Ship with a Textual TUI debugger. Run vcr in your terminal to browse sessions interactively.

🔌

Framework Agnostic

Native integrations for LangGraph and CrewAI. Decorator API for raw Python — no framework required.

🔒

ACID Transactions

BEGIN, SAVEPOINT, ROLLBACK, COMMIT — backed by git. Rollback physically reverts files on disk, not just in-memory state.

⭐

Golden Run Cache

Save successful runs as replayable golden paths. Same task next time? Zero tokens, zero cost, instant.

🛡️

Sentinel Guardian

Real-time code quality analysis. Catches duplicate functions, complexity spikes, and file bloat before the agent moves on. Learn more →

Integrations

Drop into any framework in one line.

langgraph_agent.py

from langgraph.graph import StateGraph
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import VCRLangGraph

graph = StateGraph()
graph.add_node("planner", planner_node)
graph.add_node("coder", coder_node)

# Add VCR in one line
recorder = VCRRecorder()
graph = VCRLangGraph(recorder).wrap_graph(graph)

result = graph.invoke({"query": "Build a todo app"})
recorder.save()

openhands_sentinel.py

from openhands_sentinel import Sentinel
from agent_vcr import VCRRecorder

recorder = VCRRecorder()
sentinel = Sentinel(recorder=recorder)

# 3 lines — auto-intercepts every file write
sentinel.attach(runtime.event_stream)

# Or scan any directory standalone
# $ sentinel scan ./my-project

# Every check recorded as a VCR frame
# Full audit trail in .vcr/

crew_agent.py

from crewai import Crew
from agent_vcr import VCRRecorder
from agent_vcr.integrations.crewai import VCRCrewAI

recorder = VCRRecorder()
recorder.start_session("crew_run")

crew = Crew(agents=[researcher, writer], tasks=[...])

# Wrap and run — recording is automatic
vcr_crew = VCRCrewAI(recorder)
result = vcr_crew.kickoff(crew)

recorder.save()

agent.py

from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import vcr_record

recorder = VCRRecorder()

# Decorate any function
@vcr_record(recorder, node_name="my_step")
def my_step(data: dict) -> dict:
    return process(data)

# Each call is automatically recorded
result = my_step({"key": "value"})

async_agent.py

from agent_vcr import AsyncVCRRecorder, AsyncVCRPlayer

recorder = AsyncVCRRecorder()
await recorder.start_session("async_run")

# Fully non-blocking recording
await recorder.record_step(
    node_name="fetch_context",
    input_state=query_state,
    output_state=result_state,
)

path = await recorder.save()

# Async time-travel
player = await AsyncVCRPlayer.load(path)
state = await player.goto_frame(3)

How does it compare?

Agent VCR is the only tool that lets you change what happened.

Feature	Agent VCR	LangSmith	LangFuse	Arize Phoenix
Record execution traces	✓	✓	✓	✓
Time-travel to any step	✓	✗	✗	✗
Edit state & resume	✓	✗	✗	✗
Fork from any frame	✓	✗	✗	✗
ACID transactions	✓	✗	✗	✗
Golden Run Cache	✓	✗	✗	✗
Real-time code guardian	✓ Sentinel	✗	✗	✗
Self-hosted / local-first	✓	Cloud only	✓	✓
Git-friendly format	✓ JSONL	✗	✗	✗
Setup lines of code	3	~15	~10	~10

API Reference

Minimal, predictable interfaces.

VCRRecorder Core

# Start a recording session
recorder.start_session(
    session_id: str = None,
    metadata: dict = None,
    tags: list[str] = None,
) -> Session

# Record one agent step
recorder.record_step(
    node_name: str,
    input_state: dict,
    output_state: dict,
    metadata: FrameMetadata = None,
) -> Frame

# Convenience recorders
recorder.record_llm_call(...)
recorder.record_tool_call(...)
recorder.record_error(...)

# Save & fork
recorder.save() -> Path
recorder.fork(from_frame: int) -> VCRRecorder

VCRPlayer Core

# Load a saved session
player = VCRPlayer.load(filepath: str)

# Time-travel
player.goto_frame(index: int) -> dict
player.get_frame(index: int) -> Frame

# Inspect
player.list_nodes() -> list[str]
player.get_errors() -> list[Frame]
player.compare_frames(a: int, b: int) -> dict

# Resume execution
player.resume(
    agent_callable: Callable,
    config: ResumeConfig,
) -> str

# Export
player.export_state(frame_index: int) -> dict

ResumeConfig Config

# Configure how replay works
ResumeConfig(
    from_frame: int,

    # Optional: override state before resume
    state_overrides: dict = {},

    # FORK: new session from this point
    # REPLAY: re-run same inputs
    # MOCK: use injected mock values
    mode: ResumeMode = FORK,

    # Skip specific nodes
    skip_nodes: list[str] = [],

    # Inject mocks for tool calls
    inject_mocks: dict = {},
)

ACID Transactions for Agents

Databases solved partial failure 40 years ago. Agents have the same problem.

Without ACID

Agent fails, filesystem is polluted.

Your agent hallucinated bad code on step 5. You roll back the state object, but the files are still on disk. Half-written modules, bad imports, broken configs — all still there.

State rolled back, but files remain
Parallel agents clobber each other's work
No atomic "undo" for the filesystem

With ACID

Rollback reverts everything.

Each agent session runs on an isolated git branch. SAVEPOINT checkpoints both state and filesystem together. ROLLBACK runs git reset --hard — files are gone from disk, not just hidden.

BEGIN creates an isolated branch
SAVEPOINT = state + filesystem checkpoint
ROLLBACK = physical file revert
COMMIT = clean merge into main

acid_demo.py

from agent_vcr import VCRRecorder
from agent_vcr.integrations.openhands import ACIDWorkspace

recorder = VCRRecorder()
acid = ACIDWorkspace("/my/workspace", recorder=recorder)

acid.begin(session_id="task-001")
acid.savepoint(state, node_name="coder")
acid.rollback(to_frame_index=3)  # files physically reverted
acid.commit()                        # clean merge

Golden Run Cache — Never Pay Twice

When your agent succeeds, save the run. Next time, replay it at zero LLM cost.

golden_cache.py

from agent_vcr.golden_cache import GoldenRunCache

cache = GoldenRunCache()

# After a successful run
cache.save_golden_run("Build a REST API", recorder)

# Next time — instant, $0.00
outputs, ledger = cache.replay("Build a REST API")
print(ledger)
# CostLedger(saved=100% | $0.0123 | 4100 tokens | 2349ms)

Deterministic Fingerprinting

Tasks are hashed with SHA-256 for reliable cache lookups. Same task string always maps to the same golden run.

Cost Ledger

Tracks original tokens vs replay tokens, dollars saved, milliseconds saved, and percentage reduction.

Cache Invalidation

Call cache.invalidate(task) when the underlying codebase changes and the golden path is no longer valid.

🛡️ OpenHands Sentinel

Real-time code quality guardian for AI agents. Watches every file write, catches violations instantly, warns the agent to self-correct.

The Codebase Monster

Agents write bad code at scale.

AI agents duplicate functions across files, create 200-line monolithic handlers, and ignore existing abstractions. The codebase degrades with every task.

Duplicated functions across files
Monolithic functions with 10+ params
Cyclomatic complexity through the roof
No human around to catch it in real-time

Sentinel Catches It

Self-correcting agents.

Sentinel hooks into the OpenHands EventStream and runs instant AST analysis on every file write. When violations are detected, it warns the agent — which self-corrects in the same session.

Cross-file duplicate function detection
Complexity & length spike alerts
Agent self-corrects automatically
Full audit trail via agent-vcr

sentinel demo output

STEP 1: Agent writes auth/utils.py
🛡️ SENTINEL: auth/utils.py — CLEAN ✓

STEP 2: Agent writes handlers.py (massive monolithic function)
🛡️ SENTINEL: VIOLATIONS DETECTED!
  CRITICAL  hash_password() already exists in auth/utils.py:8
  CRITICAL  handle_auth_request() is 109 lines (max 40)
  CRITICAL  Cyclomatic complexity 32 (max 8)
  WARNING   9 parameters (max 5)

→ Sentinel warns agent. Agent self-corrects.

STEP 3: Agent rewrites handlers.py
🛡️ SENTINEL: handlers.py — CLEAN ✓ All issues resolved!

📼 Audit trail saved to .vcr/sentinel-demo.vcr

Zero Dependencies

Uses only Python's built-in ast module. No API keys, no cloud calls, no external services. Your code never leaves your machine.

Trajectory-Aware

Unlike standard linters, Sentinel tracks function definitions across the entire session. It detects duplicates that span multiple files.

Frame Size Guardrails

Detects and warns about oversized VCR frames (the OpenHands issue #7402 pattern) before they pollute the recording.

Time-travel debugging
for AI agents.

Debugging agents is a nightmare.

Rewind. Fix. Resume.

Quick Start

Install

Record your agent

Time-travel & fix

Everything you need to debug agents.

Time Travel

State Injection

Session Forking

Live WebSocket Feed

JSONL Storage

<5ms Overhead

Async Native

Terminal TUI

Framework Agnostic

ACID Transactions

Golden Run Cache

Sentinel Guardian

Integrations

How does it compare?

API Reference

ACID Transactions for Agents

Agent fails, filesystem is polluted.

Rollback reverts everything.

Golden Run Cache — Never Pay Twice

Deterministic Fingerprinting

Cost Ledger

Cache Invalidation

🛡️ OpenHands Sentinel

Agents write bad code at scale.

Self-correcting agents.

Zero Dependencies

Trajectory-Aware

Frame Size Guardrails

Stop re-running your whole chain.

Time-travel debugging for AI agents.

Debugging agents is a nightmare.

Rewind. Fix. Resume.

Quick Start

Install

Record your agent

Time-travel & fix

Everything you need to debug agents.

Time Travel

State Injection

Session Forking

Live WebSocket Feed

JSONL Storage

<5ms Overhead

Async Native

Terminal TUI

Framework Agnostic

ACID Transactions

Golden Run Cache

Sentinel Guardian

Integrations

How does it compare?

API Reference

ACID Transactions for Agents

Agent fails, filesystem is polluted.

Rollback reverts everything.

Golden Run Cache — Never Pay Twice

Deterministic Fingerprinting

Cost Ledger

Cache Invalidation

🛡️ OpenHands Sentinel

Agents write bad code at scale.

Self-correcting agents.

Zero Dependencies

Trajectory-Aware

Frame Size Guardrails

Stop re-running your whole chain.

Time-travel debugging
for AI agents.