Insights AI News Open Source AI Agent Safety Tools: How to Prevent Failures
post

AI News

25 May 2026

Read 10 min

Open Source AI Agent Safety Tools: How to Prevent Failures

Open source AI agent safety tools let engineers run CI tests and design checks to prevent incidents.

Enterprises now ship agents that read email, call APIs, and run code. To keep them safe, teams need open source AI agent safety tools that bake risk checks into design and CI. Microsoft’s RAMPART and Clarity help you test prompt injection, reproduce incidents, and stress-test designs before you build. Together they turn safety from a late review into daily practice. AI has moved from “write a reply” to “take an action.” That action can help a user or cause harm. You cannot bolt on safety at the end. Microsoft released two open projects that make safety part of daily work: RAMPART for agent testing and Clarity for design reviews you can track in your repo.

Why open source AI agent safety tools matter now

Agents now fetch data, run tools, and change systems. A single bad prompt or poisoned document can push them off track. Teams need guardrails that:
  • Question risky design choices early.
  • Turn red-team lessons into repeatable tests.
  • Reproduce incidents and verify fixes under variation.
RAMPART and Clarity answer these needs in simple, engineering-native ways.

Meet RAMPART: test agent behavior like you mean it

What RAMPART does

RAMPART is a test framework that brings red-team tactics into CI. It builds on PyRIT to supply strong attack strategies out of the box. You write pytest tests that exercise your agent, assert safe outcomes, and fail the build when safety drifts.

How it fits your workflow

  • Connect through a thin adapter to your agent.
  • Model scenarios from your threat model and user flows.
  • Gate merges in CI with clear pass/fail signals.
  • Add safety tests in the same PR that adds a new tool or data source.

What makes it different

  • Focused on prompt injection: It targets cross-prompt injection, where emails, docs, or tickets smuggle instructions that hijack the agent.
  • Built for probabilistic models: Run trials and set thresholds (for example, “safe in at least 80% of runs”) to reflect real-world variance.
  • Designed for incident learning: Encode red-team findings and production incidents as tests so fixes stick and regressions get caught fast.

How RAMPART evaluates safety

It inspects what matters: which tools the agent calls, what side effects happen, and whether actions stay within policy. Evaluators are composable, so you can combine checks to express nuanced rules instead of relying on a single yes/no.

Meet Clarity: make better decisions before you build

Structured design conversations that live in your repo

Clarity guides you through problem framing, solution options, failure analysis, and decisions. It writes markdown files to a .clarity-protocol/ folder in your repo, so your design record is versioned, reviewed, and searchable like code.

Strong failure analysis, many points of view

Multiple AI “thinkers” analyze the plan from angles like security, human factors, adversarial abuse, and operations. Your team then groups failures, traces causes, and sets mitigation plans you can own.

Decisions you can revisit with context

Clarity tracks decisions with criteria and trade-offs, and it watches for staleness. If the problem changes, Clarity flags related docs to refresh. It can also generate a clean review packet for stakeholders.

A workflow that blends the two

  • Define goals and users in Clarity. Capture why your agent needs each tool and what could go wrong.
  • List threats and risky flows. Highlight untrusted inputs like web content, emails, and uploaded files.
  • Translate top risks into RAMPART tests. Start with prompt injection, tool misuse, data leakage, and unsafe code execution.
  • Set statistical thresholds. Decide what “safe enough” means for each action and environment.
  • Run in CI on every change. Treat safety failures like bugs: fix, test, merge.
  • When an incident happens, reproduce it with RAMPART, expand tests with variants, and update Clarity decisions.
This loop turns open source AI agent safety tools into living guardrails that evolve with your product.

Quick start for teams

  • Install Clarity and run a short design session. Commit the .clarity-protocol/ folder and open a PR for review.
  • Add RAMPART to your test suite. Write one adapter and one high-value safety test to start.
  • Automate in CI. Fail builds on safety regressions and require owners to address them.
  • Grow coverage sprint by sprint. Each new tool or data source gets a matching safety test.

What to measure and report

  • Injection escape rate by scenario and data source.
  • Action safety rate across trials and environments.
  • Coverage of critical tools and user flows.
  • Mean time to reproduce incidents and validate fixes.
  • Number of regressions caught before release.
  • Staleness of design docs and decisions in Clarity.

Common pitfalls and how to avoid them

  • Only testing happy paths: Add adversarial and messy inputs from day one.
  • One-shot checks for probabilistic models: Use trials and thresholds.
  • Letting red-team reports rot: Convert findings into RAMPART tests immediately.
  • Granting broad tool access: In Clarity, justify every permission with a risk and a mitigation.
  • Forgetting side effects: Log and evaluate external actions, not just the model’s text.
Strong agents need strong guardrails. RAMPART and Clarity make those guardrails practical by turning safety into code and decisions you can track. If your team cares about uptime, trust, and speed, adopt open source AI agent safety tools now and put safety checks where they belong: in your daily workflow. (Source: https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/) For more news: Click Here

FAQ

Q: What are RAMPART and Clarity and how do they help secure agent development? A: RAMPART is an open-source agent test framework that lets engineers encode adversarial and benign scenarios as repeatable pytest tests that can run in CI. Clarity is a structured design tool that captures problem framing, failure analysis, and decisions in a .clarity-protocol/ folder, and together they are open source AI agent safety tools that help teams bake risk checks into design and CI. Q: How does RAMPART integrate into developer workflows and CI? A: RAMPART uses standard pytest tests and a thin adapter to connect to an agent, orchestrate interactions, and evaluate observable outcomes. Tests return clear pass or fail signals, can be gated in CI, and teams can add a corresponding safety test in the same pull request that adds a new tool or data source. Q: What kinds of attacks and behaviors is RAMPART designed to test? A: RAMPART’s most mature coverage focuses on cross-prompt injection attacks where emails, documents, or tickets smuggle instructions that hijack an agent’s behavior. It also supports statistical trials to reflect probabilistic LLM behavior and is designed to reproduce red-team findings and production incidents so mitigations can be verified. Q: How does Clarity help teams avoid costly design mistakes before writing code? A: Clarity guides structured conversations on problem clarification, solution exploration, failure analysis, and decision tracking, writing the results to a .clarity-protocol/ folder in the repo so the design record is versioned and reviewable. It uses multiple AI “thinkers” to examine plans from security, human factors, adversarial, and operational angles and tracks staleness so related documents get nudged to refresh when problems change. Q: How should teams combine Clarity and RAMPART in a day-to-day safety workflow? A: Teams should use Clarity to define goals, users, and risky flows, then translate the top risks into RAMPART tests that cover prompt injection, tool misuse, data leakage, and unsafe code execution. They should set statistical thresholds, run tests in CI on every change, treat safety failures like bugs, and reproduce incidents with RAMPART to update Clarity decisions as needed. Q: What common pitfalls should teams avoid when adopting these tools? A: Common pitfalls include only testing happy paths, relying on single-shot checks for probabilistic models, letting red-team reports rot, granting broad tool access without documented risks, and forgetting to log and evaluate side effects. Teams can avoid these by adding adversarial and messy inputs from day one, using trials and thresholds, converting red-team findings into RAMPART tests promptly, and justifying permissions and mitigations in Clarity. Q: How can my team get started quickly with these open source AI agent safety tools? A: Install Clarity and run a short design session, commit the .clarity-protocol/ folder for review, then add RAMPART to your test suite by writing a thin adapter and one high-value safety test. Automate the tests in CI to fail builds on safety regressions and grow coverage sprint by sprint as new tools or data sources are added. Q: Which metrics should we track to measure agent safety and the effectiveness of the tests? A: Track injection escape rate by scenario and data source, action safety rate across trials, coverage of critical tools and user flows, mean time to reproduce incidents and validate fixes, number of regressions caught before release, and staleness of design docs in Clarity. Monitoring these metrics helps ensure safety tests and recorded decisions remain effective as agents and inputs evolve.

Contents