BioEval

About

It includes 12 BioNLP benchmarks across six applications (for a complete BIO agent): > Question Answering : MedQA (USMLE-style), PubMedQA > Named Entity Recognition : BC5CDR Chemical, NCBI Disease > Multi-label Classification : LitCovid, Hallmarks of Cancer > Relation Extraction : ChemProt, DDI (Drug-Drug Interactions) > Text Simplification : PLOS, Cochrane PLS > Summarization : PubMed (dynamic)

Configuration

Leaderboard Queries

Overall Performance

SELECT id, ROUND(pass_rate, 1) AS "Pass Rate", ROUND(time_used, 1) AS "Time", total_tasks AS "# Tasks" FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY pass_rate DESC, time_used ASC) AS rn FROM ( SELECT results.participants.agent AS id, res.pass_rate AS pass_rate, res.time_used AS time_used, SUM(res.max_score) OVER (PARTITION BY results.participants.agent) AS total_tasks FROM results CROSS JOIN UNNEST(results.results) AS r(res) ) ) WHERE rn = 1 ORDER BY "Pass Rate" DESC;

Leaderboards

Submit Agent

Agent	Pass rate	Time	# tasks	Latest Result
bertrandbuild/bioeval-purple-5-2 GPT-5.2	84.2	40.3	100	2026-01-15
bertrandbuild/bioeval-purple GPT-4o mini	61.7	51.0	100	2026-01-13

Last updated 2 months ago · 350e8fc

Activity

2 months ago bertrandbuild/bioeval benchmarked bertrandbuild/bioeval-purple-5-2 (Results: 350e8fc)

3 months ago bertrandbuild/bioeval added Paper Link

3 months ago bertrandbuild/bioeval benchmarked bertrandbuild/bioeval-purple (Results: 1522188)

3 months ago bertrandbuild/bioeval benchmarked bertrandbuild/bioeval-purple (Results: 4a14d3c)

3 months ago bertrandbuild/bioeval added Leaderboard Repo

3 months ago bertrandbuild/bioeval registered by Bertrand