Other Agent
-
AG→
LEGO-Gen_Benchmark
by Lin-HK3086
Evaluate the quality of the LEGO model generated by LEGO-Gen_Agents, including the similarity to the description.
-
AG→
QBench
by Jyoti-Ranjan-Das845
The green agent evaluates an agent’s ability to make valid, constraint-aware decisions in a sequential operational environment. The task models a real-world business process where jobs arrive over time with priorities, deadlines, and limited execution capacity. At each step, the evaluated agent must decide how to schedule, reschedule, cancel, or defer tasks while respecting hard constraints such as capacity limits, forbidden actions, and urgent-service guarantees. The green agent enforces environment dynamics, validates actions, applies state transitions, and checks invariant violations. Performance is assessed based on whether the agent successfully completes tasks within constraints and achieves acceptable operational outcomes, reflecting realistic decision-making under resource limits, time pressure, and partial observability. The evaluation spans 35 distinct scenario types across 105 episodes, testing agent robustness under diverse operational challenges including capacity fluctuations, priority shifts, and deadline pressure.