Chain-of-Goals Hierarchical Policy

The Chain-of-Goals Hierarchical Policy (CoGHP) is an innovative framework designed to tackle the complexities of long-horizon tasks in offline goal-conditioned reinforcement learning. Unlike traditional hierarchical methods that often rely on separate high- and low-level networks and generate only a single intermediate subgoal, CoGHP unifies this process. Inspired by the chain-of-thought paradigm, it operates by autoregressively generating a sequence of latent subgoals, each serving as a reasoning step that conditions subsequent predictions, ultimately leading to a primitive action. This approach allows for the coordination of multiple intermediate decisions within a single, coherent architecture, specifically leveraging an MLP-Mixer backbone for efficient cross-token communication. CoGHP addresses the challenge of complex task decomposition, enabling more robust performance in areas like robotics and autonomous navigation where intricate, multi-step planning is essential.

Core Mechanism of Chain-of-Goals Hierarchical Policy

Autoregressive Subgoal Generation: CoGHP reformulates hierarchical decision-making as autoregressive sequence modeling. Given an initial state and a final goal, it sequentially generates a series of latent subgoals, where each subgoal acts as a reasoning step that influences the subsequent predictions and actions (2602.03389v1).
Unified Architecture: Unlike methods with separate high- and low-level networks, CoGHP employs a unified architecture. This design simplifies the coordination between different levels of abstraction by integrating subgoal generation and primitive action prediction into a single model (2602.03389v1).

Architectural Implementation of Chain-of-Goals Hierarchical Policy

MLP-Mixer Backbone: CoGHP pioneers the use of an MLP-Mixer backbone to efficiently implement its autoregressive sequence modeling. This backbone facilitates cross-token communication, enabling the model to capture intricate structural relationships among the state, final goal, generated latent subgoals, and the primitive action (2602.03389v1).
Chain-of-Thought Inspiration

Core Mechanism of Chain-of-Goals Hierarchical Policy

Architectural Implementation of Chain-of-Goals Hierarchical Policy

Advantages and Applications of Chain-of-Goals Hierarchical Policy

Sources

At a glance

Executive summary

TL;DR

Key points

Use cases

Also known as

Related topics