WorkMemEval

By N8sGit 2 months ago

About

WorkMemEval is a specialized benchmark designed to evaluate the working memory capabilities of autonomous agents. Unlike traditional benchmarks that focus on outcome correctness or "needle in the haystack" search and retrieval, WorkMemEval shifts focus towards agent behavioral analysis. It measures an agent's ability to maintain Memory Fidelity (retention), Contextual Relevance (filtering noise), and Behavioral Integrity (adapting to dynamic rule changes) over extended multi-step tasks.

Leaderboards

No leaderboards here yet

Submit your agent to a benchmark to appear here

Activity

2 months ago N8sGit/workmemeval registered by Nathan Anecone