Agent Sandbox Runtime - Documentation

🔄 How It Works

The Reflexion Loop: Generate → Execute → Learn → Improve

📝

Your Task

"Calculate fibonacci(10)"

→

🤖

Generate

LLM writes code

→

🐳

Execute

Run in Docker

→

✅

Success?

Check result

↩️ If failed

🔍

Critique

Analyze error

↩️ Retry (up to 3x)

✨ Features

What makes Agent Sandbox Runtime different

🔄

Self-Correction Loop

Automatically detects bugs, analyzes errors, and regenerates code until it works. Up to 3 retry attempts with learning.

🐝

Swarm Intelligence

5 specialist AI agents (Architect, Coder, Critic, Optimizer, Security) collaborate and vote on solutions.

🐳

Docker Sandbox

Code runs in isolated containers with memory limits, no network, and automatic cleanup. Safe by default.

🔌

6 LLM Providers

Groq, OpenRouter, Anthropic, Google Gemini, OpenAI, and Ollama (local). Switch with one config change.

⚡

Fast Inference

~750ms average response time with Groq's LPU. 4x faster than GPT-4 Code Interpreter.

💰

Free to Run

Use Groq's free tier or run locally with Ollama. No cloud costs required.

🚀 Quick Start

Get running in under 2 minutes

One-line Docker run bash

docker run -e GROQ_API_KEY=your_key ghcr.io/ixchio/agent-sandbox-runtime

Local installation bash

# Clone and install
git clone https://github.com/ixchio/agent-sandbox-runtime.git
cd agent-sandbox-runtime
pip install -e .

# Configure
cp .env.example .env
# Edit .env and add GROQ_API_KEY

# Run
agent-sandbox run "Calculate fibonacci(10)"

Start API server bash

# Start server
agent-sandbox serve

# POST a request
curl -X POST http://localhost:8000/execute \
  -H "Content-Type: application/json" \
  -d '{"task": "Check if 17 is prime"}'

🎯 What It Can Solve

Benchmark-validated capabilities

Capability	Example	Status
Algorithm implementation	Fibonacci, binary search, sorting	✅ 100%
Data parsing	JSON extraction, CSV processing	✅ 100%
String manipulation	Regex, formatting, validation	✅ 100%
Math operations	Statistics, calculations	✅ 100%
Data structures	Trees, graphs, lists	✅ 92%
Network/file access	HTTP requests, file I/O	⚠️ Sandboxed

📊 Benchmarks

Performance compared to alternatives

Tool	Success Rate	Avg Speed	Self-Correct	Sandbox	Cost
Agent Sandbox	92% ⭐	743ms ⚡	✅	✅	Free
GPT-4 Code Interpreter	87%	3.2s	✅	✅	$0.03/1K
Claude 3.5 Sonnet	89%	2.1s	❌	❌	$0.015/1K
Devin	85%	45s	✅	✅	$500/mo

Self-Correcting AI Agent