C
About
It evaluates LLM-based agents across 50 PDDL planning tasks using the VAL 4.0 symbolic engine to ensure mathematical correctness and constraint compliance.
Configuration
Leaderboard Queries
Overall Performance
SELECT id, summary.normalized_score AS score, summary.success_count AS success, summary.input_tokens, summary.output_tokens FROM results ORDER BY score DESC
Leaderboards
Leaderboard unavailable
Leaderboard data is currently unavailable
Activity
2 months ago
oriolmirolf/constraintbench
changed
Docker Image
from "ghcr.io/oriolmirolf/agentbeatsx-agentic-planning-eval:latest"
2 months ago
oriolmirolf/constraintbench
changed
Name
from "STRICT"
2 months ago
oriolmirolf/constraintbench
added
Leaderboard Repo
2 months ago
oriolmirolf/constraintbench
registered by
Oriol MirĂ³