Intervention Training (InT) is an LLM training paradigm that addresses the credit assignment problem in outcome-reward RL by enabling models to perform fine-grained self-correction. It proposes targeted interventions to steer reasoning trajectories toward higher rewards, leveraging reference solutions to identify and correct errors.
Intervention Training (InT) helps large AI models learn to reason better by teaching them to find and fix their own mistakes in a step-by-step process. Instead of just getting a reward for the final answer, InT allows the model to correct specific errors along its thinking path, making its reasoning more accurate and reliable.
InT
Was this definition helpful?