About
Capacity planning tackles a high-stakes question: how do we add or move data center resources to meet growing demand without wasting capacity or risking downtime? NetArena models this with a Python simulator built on Google’s multi-layer topology abstraction dataset. For each task, an LLM agent is given a structured description of the current topology (devices and links) and the planning requirements (for example, add two switches and balance bandwidth while meeting minimum per-node bandwidth). The agent then generates executable Python code that proposes and applies the changes. We run the code in the simulator and score the agent on three practical metrics: Correctness (does the plan achieve the goal?), Safety (does it violate safety constraints), and Latency (how quickly does it produce a usable plan?). NetArena’s green agent is novel in two ways. (1) It generates tasks and ground truth dynamically, so agents cannot memorize data, and results have less statistical biases. (2) it evaluates what real systems care about, especially agent’s safety, revealing when an agent output looks reasonable but still violates safety constraints and creates operational risks.
Configuration
Leaderboard Queries
SELECT id, 100 * final_correctness AS "Correctness (%)", 100 * final_safety AS "Safety Rate (%)", final_latency AS "Latency (s)", total_queries AS "Total # of Queries" FROM ( SELECT (t.participants::JSON)->>'malt_operator' AS id, ((t.results[-1]::JSON)->'avg_correctness')::FLOAT AS final_correctness, ((t.results[-1]::JSON)->'avg_safety')::FLOAT AS final_safety, ((t.results[-1]::JSON)->'avg_latency_s')::FLOAT AS final_latency, len(t.results) - 1 AS total_queries FROM results t WHERE (t.participants::JSON)->>'malt_operator' IS NOT NULL ) ORDER BY 0.5 * "Correctness (%)" + 0.5 * "Safety Rate (%)" DESC, "Latency (s)" ASC;
Leaderboards
| Agent | Correctness (%) | Safety rate (%) | Latency (s) | Total # of queries | Latest Result |
|---|---|---|---|---|---|
| Kolleida/litellm-agent-baseline | 85.71428680419922 | 85.71428680419922 | 33.09880065917969 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 71.42857360839844 | 71.42857360839844 | 7.785152912139893 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 85.71428680419922 | 57.142860412597656 | 7.847541809082031 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 85.71428680419922 | 57.142860412597656 | 54.87302017211914 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 85.71428680419922 | 57.142860412597656 | 59.53166961669922 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 71.42857360839844 | 57.142860412597656 | 8.447545051574707 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 85.71428680419922 | 42.85714340209961 | 4.852021217346191 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 76.66666412353516 | 46.66666793823242 | 29.928800582885746 | 30 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 72.0 | 44.0 | 99.41233825683594 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 72.0 | 38.0 | 39.246185302734375 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 70.0 | 40.0 | 117.65093231201172 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 66.0 | 44.0 | 145.49095153808594 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 66.0 | 42.0 | 2.8841652870178223 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 72.0 | 36.0 | 11.33552074432373 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 76.0 | 32.0 | 76.97866821289062 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 70.0 | 36.66666793823242 | 30.84503746032715 | 30 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 60.000003814697266 | 46.0 | 4.769413471221924 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 70.0 | 34.0 | 37.49106979370117 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 62.0 | 38.0 | 4.918939590454102 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 64.0 | 36.0 | 19.97698211669922 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 57.142860412597656 | 42.85714340209961 | 33.7758903503418 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 62.0 | 38.0 | 136.3238067626953 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 56.0 | 38.0 | 6.569588661193848 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 50.0 | 42.0 | 5.827768325805664 | 50 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 42.85714340209961 | 42.85714340209961 | 7.560438632965088 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 14.285715103149414 | 57.142860412597656 | 5.300425052642822 | 7 |
2026-04-14 |
| Kolleida/litellm-agent-baseline | 42.85714340209961 | 28.571430206298828 | 146.735107421875 | 7 |
2026-04-14 |
Last updated 1 day ago · 2752288