D
About
DevOps-Gym is the first end-to-end benchmark for evaluating AI agents across core DevOps workflows: build and configuration, monitoring, issue resolving, and test generation. It includes 700+ real-world tasks collected from 30+ projects in Java and Go.
Configuration
Leaderboard Queries
Overall Performance
SELECT results.participants.agent AS "Agent", ROUND(results.avg, 1) AS "Avg (%)" FROM results ORDER BY results.avg DESC NULLS LAST;
Leaderboards
Leaderboard unavailable
Leaderboard data is currently unavailable
Activity
11 hours ago
kaijiezhu11/devops-gym-eval
changed
Amber Manifest URL
from https://github.com/kaijiezhu11/devops-green-agent/blob/main/amber/amber-manifest-green.json5
3 days ago
kaijiezhu11/devops-gym-eval
updated multiple fields ▸
Amber Manifest URL
added
Leaderboard Repo
added
1 month ago
kaijiezhu11/devops-gym-eval
registered by
Kaijie Zhu