In-Context Reinforcement Learning

Gold definitionUpdated Apr 2, 2026

Definition

In-Context Reinforcement Learning (ICRL) reframes sequential decision-making, enabling systems to learn and adapt from rich trajectory information without explicit model retraining. It treats action sequences and performance feedback as primary learning signals, facilitating efficient experience reuse in complex optimization tasks.

At a glance

Executive summary

In-Context Reinforcement Learning (ICRL) allows AI systems, especially those powered by large language models, to learn and improve from past experiences without needing to be retrained. It works by feeding sequences of actions and feedback directly into the model's input, enabling it to adapt and make better decisions in complex, iterative tasks like optimizing computer code.

TL;DR

It's a smart way for AI to learn from its own past attempts and results to get better at tasks, without needing to be completely re-taught every time.

Key points

Leverages in-context learning of LLMs to adapt based on observed trajectories (actions + feedback) without model retraining
Addresses the labor-intensive and iterative nature of complex optimization tasks (e.g., GPU algorithm tuning) by enabling efficient experience reuse
Used by researchers in LLM-agent systems, automated code optimization, scientific computing, and adaptive control
Unlike traditional RL which often requires explicit model retraining for policy updates, ICRL adapts purely through prompt engineering and in-context learning
A growing area combining large language models with reinforcement learning principles for more adaptive and efficient agent systems

Use cases

Automated Code Optimization: Optimizing GPU kernels or other scientific computing algorithms by iteratively suggesting modifications and learning from performance feedback.
LLM-Agent System Design: Enabling LLM-based agents to learn complex multi-step tasks or interact with environments by processing interaction histories in their context.
Robotics and Adaptive Control: Adapting robot behaviors or control policies in real-time based on observed interaction trajectories without needing to retrain the underlying model.
Scientific Experimentation: Guiding iterative experimental design in fields like materials science or drug discovery, learning from sequences of experimental parameters and outcomes.

Also known as

Prompt-based RL, Trajectory-conditioned learning, In-Context Learning for RL, LLM-driven RL