v0.6.0 ยท Open Source ยท MIT License ยท ๐Ÿ›ก๏ธ Now with Sentinel

Time-travel debugging
for AI agents.

Your agent failed on step 8 of 10. LangSmith shows you what happened.
Agent VCR lets you rewind, fix it, and resume from step 8.

<5ms overhead per step
JSONL git-friendly format
0 vendor lock-in
3 lines to integrate
The Problem

Debugging agents is a nightmare.

When your LangGraph or CrewAI agent fails on step 8 out of 10, existing tools only tell you what went wrong. To fix it, you re-run all 10 steps from scratch.

Every logic error costs minutes of wall time and dollars in wasted LLM tokens. At scale, this kills iteration speed entirely.

  • Re-run entire chain for every small fix
  • Can't inspect intermediate state without breakpoints
  • LangSmith / LangFuse are read-only observers
  • No way to test prompt changes mid-chain
The Solution

Rewind. Fix. Resume.

Agent VCR records your agent's complete state at every step into a local JSONL file. When something breaks, you jump straight to the failing frame.

Edit the state โ€” fix a bad prompt, inject corrected context, patch a tool output โ€” then resume execution forward from that exact point.

  • Jump to any frame instantly
  • Full state snapshot at every step
  • Edit state and resume mid-chain
  • Fork runs to compare variants

Quick Start

From zero to time-travel in under a minute.

01

Install

terminal
pip install ai-agent-vcr
02

Record your agent

agent.py
from agent_vcr import VCRRecorder

recorder = VCRRecorder()
recorder.start_session("debug_run")

# Your existing agent code โ€” unchanged
result = my_agent.run(query)

recorder.save()  # โ†’ .vcr/debug_run.vcr
03

Time-travel & fix

debug.py
from agent_vcr import VCRPlayer

player = VCRPlayer.load(".vcr/debug_run.vcr")

# Jump to the failing step
state = player.goto_frame(7)
print(state)          # Inspect full state

# Fix the bad state
state["prompt"] = "Corrected prompt"

# Resume from step 7 forward
player.resume(agent_callable, from_frame=7)

Everything you need to debug agents.

Built for the reality of multi-step agentic systems.

โฎ

Time Travel

Jump to any frame in a session instantly. Full input/output state snapshot at every node.

โœ

State Injection

Mutate the state at any frame โ€” fix a prompt, patch tool output, inject context โ€” then resume.

๐ŸŒฟ

Session Forking

Fork from any frame to create parallel runs. Compare how different fixes change downstream behavior.

๐Ÿ“ก

Live WebSocket Feed

Stream agent execution in real-time via the built-in FastAPI server. Watch every step as it happens.

๐Ÿ—‚

JSONL Storage

Sessions stored as plain JSONL. Human-readable, git-diffable, append-only, parseable line-by-line.

โšก

<5ms Overhead

P99 recording latency under 5ms. Benchmarked continuously in CI. Safe for production use.

๐Ÿ”„

Async Native

Full AsyncVCRRecorder and AsyncVCRPlayer. Zero blocking I/O, built for modern asyncio agents.

๐Ÿ–ฅ

Terminal TUI

Ship with a Textual TUI debugger. Run vcr in your terminal to browse sessions interactively.

๐Ÿ”Œ

Framework Agnostic

Native integrations for LangGraph and CrewAI. Decorator API for raw Python โ€” no framework required.

๐Ÿ”’

ACID Transactions

BEGIN, SAVEPOINT, ROLLBACK, COMMIT โ€” backed by git. Rollback physically reverts files on disk, not just in-memory state.

โญ

Golden Run Cache

Save successful runs as replayable golden paths. Same task next time? Zero tokens, zero cost, instant.

๐Ÿ›ก๏ธ

Sentinel Guardian

Real-time code quality analysis. Catches duplicate functions, complexity spikes, and file bloat before the agent moves on. Learn more โ†’

Integrations

Drop into any framework in one line.

langgraph_agent.py
from langgraph.graph import StateGraph
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import VCRLangGraph

graph = StateGraph()
graph.add_node("planner", planner_node)
graph.add_node("coder", coder_node)

# Add VCR in one line
recorder = VCRRecorder()
graph = VCRLangGraph(recorder).wrap_graph(graph)

result = graph.invoke({"query": "Build a todo app"})
recorder.save()
openhands_sentinel.py
from openhands_sentinel import Sentinel
from agent_vcr import VCRRecorder

recorder = VCRRecorder()
sentinel = Sentinel(recorder=recorder)

# 3 lines โ€” auto-intercepts every file write
sentinel.attach(runtime.event_stream)

# Or scan any directory standalone
# $ sentinel scan ./my-project

# Every check recorded as a VCR frame
# Full audit trail in .vcr/
crew_agent.py
from crewai import Crew
from agent_vcr import VCRRecorder
from agent_vcr.integrations.crewai import VCRCrewAI

recorder = VCRRecorder()
recorder.start_session("crew_run")

crew = Crew(agents=[researcher, writer], tasks=[...])

# Wrap and run โ€” recording is automatic
vcr_crew = VCRCrewAI(recorder)
result = vcr_crew.kickoff(crew)

recorder.save()
agent.py
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import vcr_record

recorder = VCRRecorder()

# Decorate any function
@vcr_record(recorder, node_name="my_step")
def my_step(data: dict) -> dict:
    return process(data)

# Each call is automatically recorded
result = my_step({"key": "value"})
async_agent.py
from agent_vcr import AsyncVCRRecorder, AsyncVCRPlayer

recorder = AsyncVCRRecorder()
await recorder.start_session("async_run")

# Fully non-blocking recording
await recorder.record_step(
    node_name="fetch_context",
    input_state=query_state,
    output_state=result_state,
)

path = await recorder.save()

# Async time-travel
player = await AsyncVCRPlayer.load(path)
state = await player.goto_frame(3)

How does it compare?

Agent VCR is the only tool that lets you change what happened.

Feature Agent VCR LangSmith LangFuse Arize Phoenix
Record execution traces โœ“ โœ“ โœ“ โœ“
Time-travel to any step โœ“ โœ— โœ— โœ—
Edit state & resume โœ“ โœ— โœ— โœ—
Fork from any frame โœ“ โœ— โœ— โœ—
ACID transactions โœ“ โœ— โœ— โœ—
Golden Run Cache โœ“ โœ— โœ— โœ—
Real-time code guardian โœ“ Sentinel โœ— โœ— โœ—
Self-hosted / local-first โœ“ Cloud only โœ“ โœ“
Git-friendly format โœ“ JSONL โœ— โœ— โœ—
Setup lines of code 3 ~15 ~10 ~10

API Reference

Minimal, predictable interfaces.

VCRRecorder Core
# Start a recording session
recorder.start_session(
    session_id: str = None,
    metadata: dict = None,
    tags: list[str] = None,
) -> Session

# Record one agent step
recorder.record_step(
    node_name: str,
    input_state: dict,
    output_state: dict,
    metadata: FrameMetadata = None,
) -> Frame

# Convenience recorders
recorder.record_llm_call(...)
recorder.record_tool_call(...)
recorder.record_error(...)

# Save & fork
recorder.save() -> Path
recorder.fork(from_frame: int) -> VCRRecorder
VCRPlayer Core
# Load a saved session
player = VCRPlayer.load(filepath: str)

# Time-travel
player.goto_frame(index: int) -> dict
player.get_frame(index: int) -> Frame

# Inspect
player.list_nodes() -> list[str]
player.get_errors() -> list[Frame]
player.compare_frames(a: int, b: int) -> dict

# Resume execution
player.resume(
    agent_callable: Callable,
    config: ResumeConfig,
) -> str

# Export
player.export_state(frame_index: int) -> dict
ResumeConfig Config
# Configure how replay works
ResumeConfig(
    from_frame: int,

    # Optional: override state before resume
    state_overrides: dict = {},

    # FORK: new session from this point
    # REPLAY: re-run same inputs
    # MOCK: use injected mock values
    mode: ResumeMode = FORK,

    # Skip specific nodes
    skip_nodes: list[str] = [],

    # Inject mocks for tool calls
    inject_mocks: dict = {},
)

ACID Transactions for Agents

Databases solved partial failure 40 years ago. Agents have the same problem.

Without ACID

Agent fails, filesystem is polluted.

Your agent hallucinated bad code on step 5. You roll back the state object, but the files are still on disk. Half-written modules, bad imports, broken configs โ€” all still there.

  • State rolled back, but files remain
  • Parallel agents clobber each other's work
  • No atomic "undo" for the filesystem
With ACID

Rollback reverts everything.

Each agent session runs on an isolated git branch. SAVEPOINT checkpoints both state and filesystem together. ROLLBACK runs git reset --hard โ€” files are gone from disk, not just hidden.

  • BEGIN creates an isolated branch
  • SAVEPOINT = state + filesystem checkpoint
  • ROLLBACK = physical file revert
  • COMMIT = clean merge into main
acid_demo.py
from agent_vcr import VCRRecorder
from agent_vcr.integrations.openhands import ACIDWorkspace

recorder = VCRRecorder()
acid = ACIDWorkspace("/my/workspace", recorder=recorder)

acid.begin(session_id="task-001")
acid.savepoint(state, node_name="coder")
acid.rollback(to_frame_index=3)  # files physically reverted
acid.commit()                        # clean merge

Golden Run Cache โ€” Never Pay Twice

When your agent succeeds, save the run. Next time, replay it at zero LLM cost.

golden_cache.py
from agent_vcr.golden_cache import GoldenRunCache

cache = GoldenRunCache()

# After a successful run
cache.save_golden_run("Build a REST API", recorder)

# Next time โ€” instant, $0.00
outputs, ledger = cache.replay("Build a REST API")
print(ledger)
# CostLedger(saved=100% | $0.0123 | 4100 tokens | 2349ms)

Deterministic Fingerprinting

Tasks are hashed with SHA-256 for reliable cache lookups. Same task string always maps to the same golden run.

Cost Ledger

Tracks original tokens vs replay tokens, dollars saved, milliseconds saved, and percentage reduction.

Cache Invalidation

Call cache.invalidate(task) when the underlying codebase changes and the golden path is no longer valid.

๐Ÿ›ก๏ธ OpenHands Sentinel

Real-time code quality guardian for AI agents. Watches every file write, catches violations instantly, warns the agent to self-correct.

The Codebase Monster

Agents write bad code at scale.

AI agents duplicate functions across files, create 200-line monolithic handlers, and ignore existing abstractions. The codebase degrades with every task.

  • Duplicated functions across files
  • Monolithic functions with 10+ params
  • Cyclomatic complexity through the roof
  • No human around to catch it in real-time
Sentinel Catches It

Self-correcting agents.

Sentinel hooks into the OpenHands EventStream and runs instant AST analysis on every file write. When violations are detected, it warns the agent โ€” which self-corrects in the same session.

  • Cross-file duplicate function detection
  • Complexity & length spike alerts
  • Agent self-corrects automatically
  • Full audit trail via agent-vcr
sentinel demo output
STEP 1: Agent writes auth/utils.py
๐Ÿ›ก๏ธ SENTINEL: auth/utils.py โ€” CLEAN โœ“

STEP 2: Agent writes handlers.py (massive monolithic function)
๐Ÿ›ก๏ธ SENTINEL: VIOLATIONS DETECTED!
  CRITICAL  hash_password() already exists in auth/utils.py:8
  CRITICAL  handle_auth_request() is 109 lines (max 40)
  CRITICAL  Cyclomatic complexity 32 (max 8)
  WARNING   9 parameters (max 5)

โ†’ Sentinel warns agent. Agent self-corrects.

STEP 3: Agent rewrites handlers.py
๐Ÿ›ก๏ธ SENTINEL: handlers.py โ€” CLEAN โœ“ All issues resolved!

๐Ÿ“ผ Audit trail saved to .vcr/sentinel-demo.vcr

Zero Dependencies

Uses only Python's built-in ast module. No API keys, no cloud calls, no external services. Your code never leaves your machine.

Trajectory-Aware

Unlike standard linters, Sentinel tracks function definitions across the entire session. It detects duplicates that span multiple files.

Frame Size Guardrails

Detects and warns about oversized VCR frames (the OpenHands issue #7402 pattern) before they pollute the recording.

Stop re-running your whole chain.

Install Agent VCR and start debugging from the exact frame that broke.

pip install ai-agent-vcr

MIT License ยท No signup required ยท Works offline