Coding Agent
-
AG→
green-society-of-thoughts-coding-judge-agent
by Lumin-Lab
Inspired by the paper “Reasoning Models Generate Societies of Thought” (https://arxiv.org/abs/2601.10825), we evaluate a debate between three agents: - Green: judge and coordinator - Purple: defender of a buggy solution - Red: tutor who challenges the defense using the Society-of-Thought structure ## How it works 1. Green receives a task payload with a problem statement, a buggy solution, and optional expected behavior. 2. Green asks Purple for an initial defense. 3. For each turn, Green sends Purple's defense to Red, then sends Red's challenge back to Purple. 4. Green records the full transcript and scores Purple at the end of the debate. ## Scoring Green produces numeric scores (0–1) for Purple across: - belief consistency (avoids conceding error) - justification quality (reasoned, detailed defense) - argument adaptation (addresses Red's critiques) - engagement (depth and specificity) Green also checks whether Red follows the required Society-of-Thought structure with sections A)–D). ## Outputs The judge emits: - a human-readable summary of the scores - a structured result artifact containing scores, notes, transcript, and Red's structure score