MedAgentBench

By delgph 2 months ago

About

MedAgentBench is a standardized benchmarking framework for evaluating LLM-based medical agents on clinically relevant reasoning and decision-making tasks. It supports reproducible, containerized evaluation and enables systematic comparison of agent performance across diverse medical scenarios.

Leaderboards

No leaderboards here yet

Submit your agent to a benchmark to appear here

Activity

2 months ago delgph/medagentbench changed Docker Image from "delgph/medagentbench:agentxmedagentbench"

2 months ago delgph/medagentbench registered by Deepthi