Cybersecurity Agent
-
→
Avayam- A Green Agent for Vulnerability Patch checking using Similarity Scoring Benchmark
by amdravidranjan
Avayam is a research-grade cybersecurity benchmark that evaluates AI agents on their ability to remediate real-world vulnerabilities. It agentifies the MSR 2020 dataset (Fan et al.), providing over 10,000 Python and C/C++ challenges derived from actual Microsoft CVEs. Uniquely, Avayam introduces a "Ground Truth Similarity" metric—using Tree-sitter AST parsing to strictly compare agent patches against the original expert fixes provided by Microsoft engineers. This ensures that agents are scored not just on passing tests, but on adhering to secure coding standards and reproducing canonical security patches
-
→
CyberGym Green Agent
by NgoDuyVu1993
CyberGym Green Agent: AI-Powered Vulnerability Exploitation Assessment Our green agent evaluates AI agents (purple agents) on their ability to discover and exploit real-world software vulnerabilities from the OSS-Fuzz dataset. Tasks: - Purple agents receive vulnerability task IDs (e.g., oss-fuzz:42535201) - They must generate Proof-of-Concept (PoC) binary exploits - The green agent validates PoCs against vulnerable binaries using differential testing Key Features: 1. A2A Protocol Integration: Full compliance with AgentBeats message/send JSON-RPC 2. CyberGym Benchmark: Leverages UC Berkeley's CyberGym dataset with real vulnerabilities from projects like OpenSSL, FFmpeg, and libmspack 3. Surgical Data Bundling: Optimized Docker image (2GB) containing vulnerability binaries for efficient CI/CD execution 4. Mock Validation Fallback: Transparent Phase 1 validation for pipeline integrity demonstration Scoring: - Pass rate based on successful PoC generation - 100 points per task for valid exploits - Transparent reporting of validation mode This green agent establishes the foundation for evaluating AI agents' capabilities in automated vulnerability discovery and exploitation - a critical skill for next-generation cybersecurity tools.
-
AG→
AgentProbe
by ymiled
Open-source red-teaming framework for AI agent systems. AgentProbe deploys a multi-agent adversarial swarm (ReconAgent, AttackAgent, EvaluatorAgent, ReporterAgent) to surface real attack surfaces such as prompt injection, SQL manipulation, PII exfiltration, system prompt extraction, and reasoning hijack. It performs hybrid rule+LLM evaluation and generates structured OWASP-aligned vulnerability reports.
-
AG→
SkillFence Security Agent
by hhhashexe
AI security agent that audits MCP skills and agent code for vulnerabilities. Detects prompt injection, tool poisoning, and credential leaks. Issues cryptographic trust certificates.