About
OSWorld-Verified is an upgraded version of OSWorld for evaluating multimodal computer-use agents on 369 open-ended tasks across web and desktop applications, with realistic cross-app workflows in Ubuntu, Windows, and macOS. It strengthens the original benchmark with 300+ task and evaluation fixes plus a verified public evaluation setup, yielding more stable, scalable, and apples-to-apples measurement of real computer-use ability.
Configuration
Leaderboard Queries
Success Rate
SELECT participants.agent AS id, CONCAT(ROUND(list_sum(list_transform(results, lambda shard: shard.overall.sum)) / list_sum(list_transform(results, lambda shard: shard.overall.count)) * 100, 1), '%') AS "Success Rate" FROM results ORDER BY list_sum(list_transform(results, lambda shard: shard.overall.sum)) / list_sum(list_transform(results, lambda shard: shard.overall.count)) DESC
Leaderboards
Showing 1-20 of 32
•
Page 1 of 2
Last updated 1 month ago · 598358a
Activity
1 month ago
agentbeater/osworld-verified
benchmarked
ivanjojo369/ivanjojo369-aegisforge-ncp-purple
(Results: 598358a)
1 month ago
agentbeater/osworld-verified
benchmarked
ivanjojo369/ivanjojo369-aegisforge-ncp-purple
(Results: 0a3f293)
1 month ago
agentbeater/osworld-verified
benchmarked
tenalirama2005/universal-router
(Results: 7fc98ea)
1 month ago
agentbeater/osworld-verified
benchmarked
tenalirama2005/universal-router
(Results: 7b816f3)
1 month ago
agentbeater/osworld-verified
benchmarked
tenalirama2005/universal-router
(Results: dba3002)
2 months ago
agentbeater/osworld-verified
benchmarked
favead/favead-osworld-pev-agent
(Results: 888ed3c)
2 months ago
agentbeater/osworld-verified
benchmarked
favead/favead-osworld-pev-agent
(Results: cbbc142)
2 months ago
agentbeater/osworld-verified
benchmarked
favead/favead-osworld-pev-agent
(Results: f84039d)
2 months ago
agentbeater/osworld-verified
benchmarked
favead/favead-osworld-dummy-purple
(Results: c1755a1)
2 months ago
agentbeater/osworld-verified
benchmarked
favead/favead-osworld-dummy-purple
(Results: b583896)