PCL-Reasoner-V1.5

Executive summary

PCL-Reasoner-V1.5 is a powerful 32-billion-parameter AI model specialized in solving complex math problems, achieving top scores on challenging benchmarks. It uses a new, more stable way of training called offline reinforcement learning to improve its reasoning skills.

TL;DR

PCL-Reasoner-V1.5 is a huge AI model that's really good at math, trained with a special, more stable method to achieve top performance.

Key points

Utilizes supervised fine-tuning followed by an innovative offline reinforcement learning method on a Qwen2.5-32B base.
Solves the problem of achieving state-of-the-art mathematical reasoning in LLMs with improved training stability and efficiency.
Used by researchers and ML engineers developing advanced LLMs for mathematical and scientific applications.
Employs offline RL, which offers superior stability and efficiency compared to traditional online RL methods like GRPO.
Represents a significant research trend in applying stable and efficient RL paradigms to enhance LLM reasoning capabilities.

Definition

At a glance

Executive summary

TL;DR

Key points

Use cases

Related papers

Related topics