The governance graph is a foundational component of Institutional AI, a system-level approach to AI alignment. It is precisely defined as a public, immutable manifest that explicitly declares legal states, permissible transitions between these states, associated sanctions for violations, and restorative paths. The core mechanism involves an Oracle/Controller runtime that interprets this manifest, dynamically attaching enforceable consequences to observed evidence of coordination among multi-agent systems. This approach is crucial because it reframes AI alignment from merely engineering agent preferences to designing robust institutional mechanisms, effectively solving the problem of multi-agent LLM ensembles converging on coordinated, socially harmful equilibria like collusion. Researchers in AI alignment, multi-agent systems, and mechanism design, particularly those developing robust and ethical autonomous systems, are the primary users and beneficiaries of this framework.
A governance graph is a set of explicit, unchangeable rules that dictate how AI systems should behave and what happens if they don't. It's used to prevent groups of AIs from working together in harmful ways, like colluding, by automatically enforcing consequences based on their actions.
Was this definition helpful?