o1

Gold definitionUpdated Apr 2, 2026

Definition

o1 represents a paradigm in LLM development that optimizes performance by increasing computational effort during inference, known as test-time compute. It enables models to achieve high capabilities, like GPT-4 performance, by leveraging more compute at runtime rather than solely through larger model sizes.

At a glance

Executive summary

o1 is a strategy for large language models that boosts performance by using more computing power when the model is actually generating responses, rather than just making the model bigger. This allows models to achieve top-tier performance, like GPT-4, by investing in smarter inference rather than just massive scale.

TL;DR

o1 is a way to make AI models perform better by using more clever computations during inference, instead of just making the models larger.

Key points

Increases computational effort during inference (test-time compute) to boost LLM performance.
Solves the 'scaling wall' problem by offering an alternative to ever-larger models and training costs.
Used by researchers and developers of advanced LLMs, exemplified by models like DeepSeek-R1.
Contrasts with traditional scaling by focusing on inference optimization rather than model size or training data.
Represents a growing research trend towards efficient, high-performance LLMs that manage resource constraints.

Use cases

Deploying highly capable LLMs in scenarios where training large models is prohibitive but inference latency can be managed.
Developing specialized AI agents that require complex reasoning steps at runtime for critical tasks.
Creating advanced conversational AI systems where response quality is paramount, even if it requires slightly longer processing.
Enabling smaller foundational models to achieve competitive performance against much larger counterparts through sophisticated inference techniques.

Also known as

test-time compute, inference-time optimization

o1