Q-learning

Gold definitionUpdated Apr 2, 2026

Definition

Q-learning is a model-free reinforcement learning algorithm that enables an agent to learn an optimal action-value function (Q-function) for sequential decision-making. It iteratively updates Q-values based on observed rewards, guiding the agent to choose actions that maximize cumulative future rewards in an environment.

At a glance

Executive summary

Q-learning is a core reinforcement learning algorithm that helps AI agents learn optimal actions in different situations by estimating the value of each action. It's used to solve complex decision-making problems, even when the environment's rules are unknown, and can be enhanced for challenges like continuous actions or changing environments.

TL;DR

Q-learning is a smart way for AI to learn the best actions in any situation by trying things out and remembering what worked best to get rewards.

Key points

Learns an optimal action-value function (Q-function) through temporal difference updates, estimating future rewards for state-action pairs.
Enables agents to learn optimal policies in unknown or complex environments for sequential decision-making, maximizing cumulative rewards.
Used by researchers and engineers in robotics, game AI, control systems, logistics (e.g., railcar shunting), and adaptive systems (e.g., traffic control).
Unlike model-based RL, Q-learning is model-free, meaning it doesn't need an explicit model of the environment's dynamics, learning directly from experience.
Research focuses on adapting Q-learning for complex scenarios like continuous action spaces (QAM), non-stationary environments (MORPHIN), constrained safety (SafeQIL), and multi-agent coordination.

Use cases

Railcar Shunting Optimization: Optimizing the complex task of disassembling and reassembling railcars in freight railyards using HHRL with Q-learning.
Traffic Signal Control: Dynamically adjusting traffic signals in response to changing traffic patterns using self-adaptive Q-learning frameworks like MORPHIN.
Robotics Navigation: A robot learning optimal paths and actions to navigate an unknown environment, avoiding obstacles and reaching targets.
Game AI: Developing intelligent agents for video games that learn to play optimally against human players or other AI, adapting to game dynamics.
Safe Autonomous Systems: Implementing safe Q-learning (e.g., SafeQIL) in autonomous vehicles or industrial robots to ensure learned policies adhere to safety constraints.

Also known as

Deep Q-Network (DQN), Double Q-learning, Prioritized Experience Replay DQN, Dueling DQN, SARSA, Expected SARSA, Q-learning with Adjoint Matching (QAM), Safe Q-learning, Self-adaptive Q-learning (MORPHIN)