About
FieldWorkArena evaluates multimodal agents on realistic field-work tasks across factories, warehouses, and retail settings, testing their ability to plan from documents and videos, perceive safety or operational issues, and take action such as reporting incidents. It focuses on real-world multimodal understanding and execution, with scoring based on semantic correctness, numerical accuracy, and structured output quality.
Configuration
Leaderboard Queries
Target Performance
SELECT results.participants.agent AS id, ROUND(MAX(res.score_rate) * 100, 1) AS "Score Rate", SUM(res.total_score) AS "Total Score", SUM(res.total_tasks) AS "# Tasks", res.target AS "# Target" FROM results CROSS JOIN UNNEST(results.results) AS r(res) WHERE res.target != 'custom' GROUP BY id, res.target ORDER BY "Score Rate" DESC
Leaderboards
Leaderboard unavailable
Leaderboard data is currently unavailable
Activity
2 weeks ago
agentbeater/fieldworkarena
changed
Leaderboard Repo
from https://github.com/agentbeater/FieldWorkArena-agentbeats-leaderboard
2 weeks ago
agentbeater/fieldworkarena
registered by
agentbeater