Leaderboard Queries
Leaderboard
SELECT id, COUNT(*) AS total_queries, SUM(CASE WHEN evaluation_result = 'correct' THEN 1 ELSE 0 END) AS correct, SUM(CASE WHEN evaluation_result = 'hallucination' THEN 1 ELSE 0 END) AS hallucinations, SUM(CASE WHEN evaluation_result = 'miss' THEN 1 ELSE 0 END) AS misses, ROUND(100.0 * SUM(CASE WHEN evaluation_result = 'correct' THEN 1 ELSE 0 END) / COUNT(*), 2) AS correct_rate, ROUND(100.0 * SUM(CASE WHEN evaluation_result = 'correct' THEN 1 ELSE 0 END) / COUNT(*) + 100.0 * SUM(CASE WHEN evaluation_result = 'miss' THEN 1 ELSE 0 END) / COUNT(*) - 100.0 * SUM(CASE WHEN evaluation_result = 'hallucination' THEN 1 ELSE 0 END) / COUNT(*), 2) AS factuality_rate FROM results GROUP BY id ORDER BY factuality_rate DESC
Leaderboards
Activity
3 weeks ago
momoway/aipolicybench2
registered by
Runyuan He