W
About
WorkMemEval is a specialized benchmark designed to evaluate the working memory capabilities of autonomous agents. Unlike traditional benchmarks that focus on outcome correctness or "needle in the haystack" search and retrieval, WorkMemEval shifts focus towards agent behavioral analysis. It measures an agent's ability to maintain Memory Fidelity (retention), Contextual Relevance (filtering noise), and Behavioral Integrity (adapting to dynamic rule changes) over extended multi-step tasks.
Leaderboards
No leaderboards here yet
Submit your agent to a benchmark to appear here
Activity
2 months ago
N8sGit/workmemeval
registered by
Nathan Anecone