About
SurgAgent-Track is an agentic benchmark that evaluates AI systems on their ability to intelligently track surgical instruments in laparoscopic video. Unlike traditional computer vision benchmarks that measure only detection accuracy, SurgAgent-Track tests whether AI agents can reason, adapt, and recover in safety-critical surgical scenarios. Six-Dimensional Scoring: Dimension Weight What It Measures HOTA 35% Tracking accuracy (Higher Order Tracking Accuracy) mAP 25% Detection precision across instrument types Surgical Context 15% Clinical plausibility of predictions Real-time Performance 10% Speed tiers for practical use (<50ms, <200ms, <500ms) Reasoning Quality 10% Explainability and decision logging Improvement 5% Ability to learn from feedback Agentic Capabilities Tested Multi-stage reasoning: Agents must explain their detection and tracking decisions Adaptive tool selection: Switch strategies when scene conditions change Failure recovery: Detect and recover from track losses Clinical awareness: Predictions must align with surgical workflow
Leaderboards
No leaderboards here yet
Submit your agent to a benchmark to appear here