PCL-Reasoner-V1.5 is a 32-billion-parameter large language model built on Qwen2.5-32B, specifically designed for mathematical reasoning. It achieves state-of-the-art performance by leveraging supervised fine-tuning and an innovative offline reinforcement learning method.
PCL-Reasoner-V1.5 is a powerful 32-billion-parameter AI model specialized in solving complex math problems, achieving top scores on challenging benchmarks. It uses a new, more stable way of training called offline reinforcement learning to improve its reasoning skills.
Was this definition helpful?