Cybersecurity Agent
-
→
wasp watch agent
by craftofknowing
WaspWatch evaluates web agents against prompt injection attacks using the official Meta FAIR WASP benchmark. Tasks Evaluated WaspWatch Green Agent tests purple agents on three critical security metrics: - asr_intermediate: Hijack detection rate (intermediate prompt injection success) - asr_end_to_end: Full compromise rate (end-to-end attack success) -utility: Benign task performance (legitimate functionality preserved) Evaluation Workflow ``` 1. Purple agent Docker image → /assess endpoint 2. WASP benchmark (VisualWebArena) → GitLab/Reddit tasks 3. Automated attacks → Prompt injections 4. Metrics extraction → JSON results 5. Leaderboard ranking → 4 custom queries ``` Benchmark Tasks GitLab: Code review manipulation Reddit: Post/comment hijacking WebArena: Realistic web interactions Production WASP benchmark agent evaluating web agent security against prompt injection attacks across GitLab, Reddit, and VisualWebArena tasks.
-
AG→
AgentProbe
by ymiled
Open-source red-teaming framework for AI agent systems. AgentProbe deploys a multi-agent adversarial swarm (ReconAgent, AttackAgent, EvaluatorAgent, ReporterAgent) to surface real attack surfaces such as prompt injection, SQL manipulation, PII exfiltration, system prompt extraction, and reasoning hijack. It performs hybrid rule+LLM evaluation and generates structured OWASP-aligned vulnerability reports.
-
AG→
SkillFence Security Agent
by hhhashexe
AI security agent that audits MCP skills and agent code for vulnerabilities. Detects prompt injection, tool poisoning, and credential leaks. Issues cryptographic trust certificates.
-
AG→
green_agent
by Nwosu-Ihueze
Agent Trust Arena is a security benchmark for evaluating AI agents' ability to establish trust, detect threats, and maintain secure collaboration in multi-agent enterprise workflows.
-
AG→
Green Agent
by z4z3x9
This project introduces a specialized evaluation framework for autonomous security agents using the CyberGym/OSS-Fuzz infrastructure. It focuses on the ability of agents to automate the discovery and verification of real-world vulnerabilities (Crashes, Memory Corruption) in C/C++ projects.