AIO Sandbox unifies browser, shell and filesystem so teams run agent workflows faster, more secure.
Use this open-source AI agent runtime guide to set up a single, secure workspace where agents can browse, code, and manage files. Learn how Agent-Infra’s AIO Sandbox unifies a Chromium browser, shell, Python/Node runtimes, and an MCP layer so LLMs act reliably from plan to execution.
Modern AI agents do not fail because they think poorly. They fail because their tools do not work together. Agents need a browser to fetch data, a shell to run commands, runtimes to execute code, and a place to store files. If these live in separate containers, each handoff adds delay and risk. Agent-Infra’s AIO Sandbox solves this by putting the browser, shell, Python, Node.js, and shared storage in one container. It is open source and built for fast, isolated, and observable execution.
Why the runtime now decides agent success
Old way: split tools, fragile wiring
Most teams start with a browser container, a compute container, and a shared volume. They add glue code to move files. They open ports for remote control. They sync logs across services. Each step can fail. Each API adds latency. Debugging across services is slow.
New way: one container, one workspace
The AIO Sandbox provides a single runtime that includes:
A Chromium browser you control through the Chrome DevTools Protocol (CDP) with Playwright support.
Python and Node.js runtimes ready for code execution.
A Bash shell with the same view of the filesystem as the browser and runtimes.
VSCode Server and Jupyter Notebook for live editing, monitoring, and testing.
This one-stop setup cuts complexity and makes agent actions predictable.
What you get inside the AIO Sandbox
Browser you can drive like a robot
The Chromium instance exposes CDP. Your agent can click buttons, fill forms, scroll, capture screenshots, and read the DOM. Playwright support gives you higher-level actions if you prefer scripts over raw CDP calls. You can also observe the browser visually through an integrated VNC view when you need to verify steps.
Prebuilt compute for instant execution
Python and Node.js come pre-configured. Your agent can install packages, run scripts, and handle data without extra setup. You avoid runtime drift and version mismatches between services.
Unified file system across all tools
The browser downloads a file. The shell sees it right away. Python reads it without copying. This shared storage layer removes file transfer code and prevents sync bugs. It keeps state simple across the full task.
Developer interfaces for tight feedback loops
Open VSCode Server to inspect folders, view logs, and edit scripts. Launch Jupyter to prototype data steps and test agent prompts. These tools shorten the path from error to fix.
The unified file system in action
Imagine your agent must pull a CSV from a web portal, clean it, and export a report.
The browser logs in and downloads the CSV.
Python reads the CSV from the same folder. No copy is needed.
Python cleans the data and saves a chart image beside the CSV.
The shell zips the results and moves them to an output directory.
Everything happens in one workspace. The agent avoids fragile network transfers and unused temp storage. Latency drops. Reliability rises.
MCP integration: standard tools for smarter agents
The Model Context Protocol (MCP) gives models a clear way to discover and call tools. The Sandbox ships with MCP servers that map common actions to safe functions:
Browser MCP: Navigate, extract content, take screenshots, and pull links.
File MCP: List, read, write, move, and delete files in the unified filesystem.
Shell MCP: Execute commands with controlled access to the environment.
Markitdown MCP: Convert documents to Markdown so LLMs parse them cleanly.
With MCP, your prompts do not need ad hoc glue code. The agent learns a fixed toolset and uses it consistently across tasks.
Setup and first run
You can start quickly if you already use Docker.
Install Docker on your machine or server.
Pull the Sandbox container image from the project repository.
Run the container with a named volume for persistence.
Expose ports for the API, VSCode Server, and the browser’s VNC if needed.
Open the SDK or API client to create a session and send simple actions.
Use VSCode Server to watch files and logs as the agent works.
As you verify your first flow, keep this open-source AI agent runtime guide close. Check that downloads land in the shared folder. Confirm Python can import libraries. Validate that the browser and shell see the same paths.
Build end-to-end agent flows
Research to report
Browser MCP gathers URLs and scrapes tables.
File MCP saves raw HTML and CSV assets.
Python cleans data and generates charts.
Markitdown turns notes and results into a clean Markdown report.
Shell MCP zips the package for delivery.
Portal automation
Use Playwright to log in, navigate tabs, and apply filters.
Download invoices or statements to the shared filesystem.
Run a Python script to rename files and extract totals.
Save a summary JSON and a monthly CSV for downstream tools.
Document preparation for LLMs
Browser MCP fetches PDFs or DOCX files.
Markitdown converts them to Markdown.
Python chunks the text and removes boilerplate.
File MCP writes cleaned context bundles for later prompts.
Observability and debugging you can trust
Work like a developer, not a detective
Open VSCode Server to step through scripts and inspect variables.
Use Jupyter to test data transforms before your agent runs them.
Capture CDP traces to replay browser actions.
Write logs and artifacts to the shared folder for quick review.
Keep screenshots and console output to prove what happened.
These tools turn opaque agent runs into traceable sessions you can audit.
Security and isolation without friction
Each Sandbox runs in its own container. It separates the agent’s generated code from the host. You can set CPU and memory limits. You can restrict network access. You can avoid mounting sensitive host paths. Combine this with standard Linux permissions, non-root users, and read-only mounts where possible.
Control what the agent can do
Use Kubernetes network policies to block unwanted egress.
Scope credentials to the task and rotate them often.
Whitelist shell commands or add policy checks in the SDK.
Keep package installs cached but audited.
Reset the workspace after the job unless you need persistence.
This approach keeps power high and blast radius low.
How to scale on Kubernetes
You can deploy many Sandboxes side by side for high throughput.
Use a Deployment for long-lived agent services or a Job for batch runs.
Attach a PersistentVolumeClaim if you need state across restarts.
Set resource requests and limits to prevent noisy neighbors.
Autoscale based on CPU or custom metrics like active sessions.
Pre-warm a small pool of ready pods to cut cold start time.
Tag pods by team or project for cost tracking and cleanup.
Session routing is simple because each agent has one container. No cross-service sync is required.
Traditional Docker vs the Sandbox: make the right call
Pick the Sandbox when
Your agent needs a browser, code execution, and file I/O in one place.
You want fast handoffs with minimal glue code.
You prefer standard MCP servers over custom RPCs.
You need live visibility with VSCode Server and Jupyter.
Stay with a multi-container stack when
You already run separate, specialized services with strict isolation.
You need different OS images or GPU stacks per tool.
Your pipeline is stable and observability is solved elsewhere.
The Sandbox optimizes for agent speed and simplicity. Use it to cut time-to-value and remove integration pain.
Cost and performance tips
Right-size CPU and memory. Browser actions need bursts, not huge reservations.
Run Chromium headless when you do not need a live view.
Cache pip and npm directories on a volume to avoid repeated downloads.
Batch small tasks into one session to reuse the warm environment.
Clean up large artifacts after upload to save disk.
Throttle crawling rates to avoid waste and site bans.
Risk management for LLM-driven shells
LLMs can overreach. Add guardrails.
Filter commands. Block destructive actions like rm -rf or unrestricted curl.
Enforce working directories. Keep writes inside a safe path.
Limit runtime. Kill long processes and cap log sizes.
Review diffs. Require approval before high-impact changes.
Record provenance. Store prompts, tool calls, and outputs together.
These checks keep autonomy useful and safe.
API and SDK: operate Sandboxes like a platform
You can manage sessions, run commands, and fetch artifacts through an API and SDK. This enables:
Self-service environments per request or user.
Automated cleanup after runs.
Scheduled jobs that use the same browser and compute stack.
A central controller that routes tasks to available Sandboxes.
Treat the Sandbox as a building block. Compose it into your agent platform without reinventing orchestration.
open-source AI agent runtime guide: best practices
One task, one workspace: keep sessions focused and short-lived unless you need state.
Prefer MCP tools over ad hoc scripts: they are easier to audit and reuse.
Store everything important in the shared filesystem: inputs, logs, screenshots, and outputs.
Fail fast and visibly: write checkpoints so you can restart mid-flow.
Mock external services in tests: do not burn bandwidth or risk bans during dev.
Pin package versions: stable runs beat surprise upgrades.
Use visual verification sparingly: enable VNC only when debugging.
Plan handoffs: define where raw, processed, and final data live.
This open-source AI agent runtime guide checklist helps teams move from demos to durable production runs.
Common pitfalls and how to avoid them
Silent browser failures: capture network logs and screenshots on each step.
Mismatched paths: standardize environment variables for workspace roots.
Dependency drift: bake base images weekly, not on every run.
Oversized containers: remove unused tools and data after each job.
Unbounded sessions: enforce timeouts and idle shutdowns.
Weak secrets hygiene: inject credentials at runtime; never bake them into images.
The shift from model-only thinking to execution-first design is here. Agent-Infra’s AIO Sandbox makes agents faster and more reliable by unifying the browser, shell, runtimes, and files under one roof, and by speaking MCP for tool access. If you want fewer moving parts, quicker debugging, and safer autonomy, this open-source AI agent runtime guide points you to a practical path: start small, wire tools once, observe everything, and scale with containers you can trust.
(Source:
https://www.marktechpost.com/2026/03/29/agent-infra-releases-aio-sandbox-an-all-in-one-runtime-for-ai-agents-with-browser-shell-shared-filesystem-and-mcp/)
For more news: Click Here
FAQ
Q: What is Agent-Infra’s AIO Sandbox and what problem does it solve?
A: The AIO Sandbox is an all-in-one containerized runtime that unifies a Chromium browser (controllable via CDP/Playwright), a bash shell, Python and Node.js runtimes, and a shared filesystem to reduce tool fragmentation and synchronization overhead. It is open-source (Apache-2.0) and designed for fast, isolated, and observable execution to help agents move from plan to execution reliably.
Q: How does the unified file system in the Sandbox work and why does it matter?
A: The Sandbox provides a shared storage layer so files downloaded by the Chromium browser are immediately visible to the Python interpreter and the bash shell without copying or external transfers. This reduces latency, eliminates fragile file-transfer glue code, and keeps state consistent across an agent’s multi-step workflow.
Q: What tools and runtimes are included inside a single Sandbox container?
A: The container includes a Chromium browser controllable via the Chrome DevTools Protocol with Playwright support, pre-configured Python and Node.js runtimes, a bash terminal, and developer interfaces like VSCode Server and Jupyter Notebook, with an integrated VNC view for visual verification. These components are packaged to allow agents to browse, execute code, and edit or monitor artifacts in one workspace.
Q: How does the Sandbox integrate with the Model Context Protocol (MCP)?
A: The Sandbox ships pre-configured MCP servers that expose browser, file, shell, and Markitdown capabilities so LLMs can discover and call tools via a standardized protocol. This lets models use a consistent toolset for navigation, file operations, command execution, and document conversion without bespoke glue code.
Q: How do I set up and run the AIO Sandbox for the first time?
A: To start, install Docker, pull the Sandbox container image from the project repo, run the container with a named volume for persistence, and expose ports for the API, VSCode Server, and VNC as needed, then use the SDK or API client to create a session and send actions. As you verify your first flow, keep this open-source AI agent runtime guide close to confirm downloads land in the shared folder and that the browser, shell, and Python runtimes see the same paths.
Q: What observability and debugging features are available in the Sandbox?
A: The Sandbox provides integrated observability via VSCode Server and Jupyter for live editing and testing, CDP traces for replaying browser actions, and a shared folder for logs, screenshots, and artifacts so runs can be audited. These tools let developers inspect variables, capture screenshots on failure, and trace agent steps without cross-service detective work.
Q: How does the Sandbox handle security and isolation for running untrusted agent code?
A: Each Sandbox runs in its own container, enabling separation between agent-generated code and the host while letting teams set CPU and memory limits, restrict network access, and avoid mounting sensitive host paths. The article also recommends standard practices such as non-root users, read-only mounts where possible, whitelisting shell commands, scoping credentials, and resetting or limiting sessions to reduce blast radius.
Q: When should teams choose the AIO Sandbox instead of a traditional multi-container architecture?
A: Choose the Sandbox when your agent workflows need a browser, code execution, and file I/O in one workspace, when you want fast handoffs with minimal glue code, and when native MCP servers and built-in VSCode/Jupyter visibility help reduce integration work. Stick with a multi-container stack if you require different OS images or GPU stacks per tool, need stricter service-level isolation, or already have a stable pipeline and observability solved elsewhere.