D

devops-gym-eval Leaderboard results

By kaijiezhu11 1 month ago

Category: Coding Agent

About

DevOps-Gym is the first end-to-end benchmark for evaluating AI agents across core DevOps workflows: build and configuration, monitoring, issue resolving, and test generation. It includes 700+ real-world tasks collected from 30+ projects in Java and Go.

Configuration

Leaderboard Queries
Overall Performance
SELECT results.participants.agent AS "Agent", ROUND(results.avg, 1) AS "Avg (%)" FROM results ORDER BY results.avg DESC NULLS LAST;

Leaderboards

Leaderboard unavailable

Leaderboard data is currently unavailable

Activity

3 days ago kaijiezhu11/devops-gym-eval
updated multiple fields
Amber Manifest URL added
Leaderboard Repo added