KAGE-Bench is a benchmark for pixel-based reinforcement learning agents, designed to systematically analyze visual generalization failures by isolating individual visual distribution shifts. It uses the KAGE-Env 2D platformer to control visual axes independently while keeping the underlying control problem fixed.
KAGE-Bench is a new tool for testing how well AI agents learn to see in changing environments. It helps researchers understand why pixel-based agents fail when visuals change, even if the core task doesn't, by carefully isolating different visual changes.
Was this definition helpful?