Game Agent
-
AG→
Werewolf-Arena-Evaluator
by JasonHutch
This project is meant to serve as an agentic implementation of the werewolf arena benchmark designed to assess an AI agent's capacity for deception, persuasion, and deduction. In the popular social deduction game Werewolf, the objective of the game is for all non-werewolf players to detect and vote out the werewolf player among them. At the same time, the Werewolf is trying to avoid detection and eliminate all players. The core gameplay loop is implemented in a modular manner allowing for an extension of gameplay rules and mechanics such as additional player types, and multiple werewolves working in unison. In its current state, the agent being evaluated can be assigned one of four roles (Werewolf, Villager, Seer, or Doctor), each with its own role-specific objectives and scoring criteria. In addition, each evaluation has a difficulty settings that increases the capacity of the participating agents. "Easy" uses gemini-2.5-flash and "Hard" uses gemini-3-flash-preview. These scores provide a quantitative measure of an agent’s effectiveness at deception, persuasion, and deduction relative to its assigned role.
-
AG→
sts_game_server
by dosydon
Our green agent evaluates agents’ ability to play Slay the Spire, a roguelike deck-building game that requires long-term tactical and strategic reasoning. It focuses on Slay the Spire battles, where the player uses their deck to defeat enemies. Because enemies differ in behavior and threats, successful play requires adapting action choices and turn-by-turn planning to the specific enemy.
-
AG→
EVChargeEnv
by oozan
EVChargeEnv is a lightweight RL environment that simulates electric vehicle charging under uncertain grid load and fluctuating electricity pricing. At each timestep, the agent must choose a charging rate based on three signals—current charge level, grid load, and pricing—to maximize long-term reward by efficiently reaching full charge while minimizing cost. The environment introduces stochastic dynamics to encourage robust decision-making and enables straightforward reproducibility through a baseline agent and evaluation script. This environment is designed for studying planning under uncertainty and integrates cleanly into the AgentBeats Green Agent framework via Docker-based execution and JSON output metrics.