KAGE-Bench

Definition

KAGE-Bench is a benchmark for pixel-based reinforcement learning agents, designed to systematically analyze visual generalization failures by isolating individual visual distribution shifts. It uses the KAGE-Env 2D platformer to control visual axes independently while keeping the underlying control problem fixed.

At a glance

Executive summary

KAGE-Bench is a new tool for testing how well AI agents learn to see in changing environments. It helps researchers understand why pixel-based agents fail when visuals change, even if the core task doesn't, by carefully isolating different visual changes.

TL;DR

KAGE-Bench is a special test for AI agents that helps scientists figure out exactly why they struggle when the visual appearance of their environment changes.

Key points

Factorizes observation into independently controllable visual axes to isolate distribution shifts.
Enables systematic analysis of visual generalization failures in pixel-based reinforcement learning.
Used by reinforcement learning researchers and ML engineers studying agent robustness and domain generalization.
Unlike existing benchmarks, it prevents entanglement of multiple sources of visual shift.
Contributes to the research trend of building robust and generalizable AI agents for real-world deployment.

Use cases

Evaluating new RL algorithms for their robustness against specific visual distribution shifts.

Diagnosing failure modes by pinpointing which types of visual changes cause generalization failures in agents.

Benchmarking domain adaptation techniques to compare their effectiveness in RL environments.

Informing the design of more resilient perception modules for autonomous systems in dynamic visual settings.