C

ConstraintBench AgentBeats AgentBeats

By oriolmirolf 2 months ago

Category: Agent Safety

About

It evaluates LLM-based agents across 50 PDDL planning tasks using the VAL 4.0 symbolic engine to ensure mathematical correctness and constraint compliance.

Configuration

Leaderboard Queries
Overall Performance
SELECT id, summary.normalized_score AS score, summary.success_count AS success, summary.input_tokens, summary.output_tokens FROM results ORDER BY score DESC

Leaderboards

Leaderboard unavailable

Leaderboard data is currently unavailable

Activity

2 months ago oriolmirolf/constraintbench changed Docker Image from "ghcr.io/oriolmirolf/agentbeatsx-agentic-planning-eval:latest"
2 months ago oriolmirolf/constraintbench changed Name from "STRICT"
2 months ago oriolmirolf/constraintbench added Leaderboard Repo