---
source: hn
url: https://github.com/fabraix/playground
published_at: '2026-03-15T22:29:46'
authors:
- zachdotai
topics:
- ai-agent-security
- red-teaming
- open-source-playground
- jailbreak-evaluation
- agent-trust
relevance_score: 0.79
run_id: materialize-outputs
language_code: en
---

# Show HN: Open-source playground to red-team AI agents with exploits published

## Summary
This is an open-source AI agent attack-and-defense playground that systematizes red-team testing of real agents through public challenges, public prompts, and publicly disclosed exploitation techniques. The core problem it aims to solve is: how to improve trust in the security and reliability of AI agents through community co-building.

## Problem
- AI agents are taking on more and more real tasks, but if users cannot trust that they will "do only what they should and not do what they shouldn't," they cannot be deployed at scale.
- Closed internal testing is insufficient to establish credibility; agent failure modes, jailbreak paths, and tool-misuse risks require open, continuous, and reproducible stress testing.
- Many existing security demos are toy scenarios rather than live agents with real tool capabilities, making it difficult to expose problems that arise in actual deployments.

## Approach
- Build an open-source Playground: each challenge deploys a **real live AI agent** with a defined persona, callable tools (such as web search and browsing), and a protected target.
- **System prompts are public, and challenge configurations are publicly versioned**, allowing researchers to target real defensive boundaries directly for jailbreaks/exploitation rather than relying on black-box guessing.
- Use a **community-driven process**: anyone can propose a challenge, the community votes, prioritized challenges go live, and the fastest successful jailbreaker wins.
- **Fully publish the winning exploitation technique**, including the method and reasoning process, to drive defensive improvements and create a public knowledge base of failure modes.
- Guardrail evaluation runs on the server side to reduce client-side tampering; the frontend is open source, and the agent runtime will be open-sourced separately.

## Results
- The text **does not provide quantitative experimental results**; it gives no figures for success rate, benchmark datasets, baseline comparisons, or defense improvement magnitude.
- The strongest concrete claims are that the platform provides **real live agents** rather than toy examples, and that it **publishes system prompts, versioned challenge configurations, and winning exploitation methods**.
- The explicitly claimed benefit of the mechanism is a positive feedback loop: **public attack techniques → stronger defenses → harder challenges → deeper understanding of agent failure modes**.
- Verifiable engineering facts provided include: the frontend tech stack is **React + TypeScript + Vite + Tailwind**, challenge definitions are located in the **/challenges** directory, and guardrail evaluation is **executed server-side**.

## Link
- [https://github.com/fabraix/playground](https://github.com/fabraix/playground)
