EAPO

Gold definitionUpdated Apr 2, 2026

Definition

EAPO (Evidence-Augmented Policy Optimization) is an RL algorithm designed to improve LLM reasoning in long-context scenarios by addressing sparse outcome rewards. It introduces dense process supervision through a Group-Relative Evidence Reward and an Adaptive Reward-Policy Co-Evolution mechanism to enhance evidence quality.

At a glance

Executive summary

EAPO is a new AI method that helps large language models (LLMs) reason better when dealing with very long texts. It does this by giving the LLM more specific feedback on *how* it finds information, rather than just whether its final answer is right, which helps it avoid making lucky guesses and improves its ability to use evidence.

TL;DR

EAPO is a special training method that teaches AI models to find and use evidence better when reading really long documents, making their answers more reliable.

Key points

Introduces dense process supervision via a Group-Relative Evidence Reward and Adaptive Reward-Policy Co-Evolution to improve evidence quality.
Solves the problem of sparse outcome rewards in RL for LLMs, which hinders long-context reasoning and fails to penalize ungrounded guesses.
Used by researchers and engineers developing LLMs for complex tasks requiring long-context reasoning and precise evidence extraction.
Unlike traditional RL that relies on sparse outcome rewards, EAPO provides dense *process* supervision, directly targeting evidence quality.
Focuses on improving LLM reliability and reasoning capabilities in long-context scenarios, a critical area for advanced AI applications.

Use cases

Enhancing LLMs for complex question answering requiring synthesis of information from multiple, lengthy documents.
Assisting researchers in extracting precise evidence and synthesizing findings from extensive scientific papers.
Improving LLMs' ability to identify and reason with specific clauses and precedents within large legal texts.
Enabling LLMs to process patient records and research articles to provide evidence-based diagnostic assistance.

Also known as

Evidence-Augmented Policy Optimization