ARXIV:2604.01302 · REASONING ENHANCEMENT · SUBMITTED 03 APR · 20:19 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Qianfan Zhang · Tianyu Guo · Xuandi Ren · Jiale Chen · Ming Ding · Ran Xin · +1 at arXiv

A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems. During RL training, we observe an approximately log-linear relationship between validation accuracy…

METHOD

Full abstract

We study how to scale reasoning token budgets for competitive programming through two complementary approaches: training-time reinforcement learning (RL) and test-time parallel thinking. During RL training, we observe an approximately log-linear relationship between validation accuracy and the average number of generated reasoning tokens over successive checkpoints, and show two ways to shift this training trajectory: verification RL warmup raises the starting point, while randomized clipping produces a steeper trend in the observed regime. As scaling single-generation reasoning during RL quickly becomes expensive under full attention, we introduce a multi-round parallel thinking pipeline that distributes the token budget across threads and rounds of generation, verification, and refinement. We train the model end-to-end on this pipeline to match the training objective to the test-time structure. Starting from Seed-OSS-36B, the full system with 16 threads and 16 rounds per thread matches the underlying RL model's oracle pass@16 at pass@1 using 7.6 million tokens per problem on average, and surpasses GPT-5-high on 456 hard competitive programming problems from AetherCode.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. During RL training, we observe an approximately log-linear relationship between validation accuracy and the average number of generated reasoning tokens over successive checkpoints, and…

WHY NOW

Reasoning Enhancement moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

Segment

Reasoning Enhancement

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d905b401-8f6b-431a-a1ae-85e4e308ef76", "arxiv_id": "2604.01302", "canonical_route": "/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "endpoints": { "paper_pack": "/api/v1/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming/paper-pack", "build_passport": "/api/v1/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming", "normalized_query": "2604.01302", "route": "/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "paper_ref": "scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming#webpage", "url": "https://sciencetostartup.com/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "name": "Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming", "description": "A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming#scholarlyArticle", "headline": "Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming", "description": "A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.", "url": "https://sciencetostartup.com/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming", "sameAs": "https://arxiv.org/abs/2604.01302", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01302" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-01T18:05:49.000Z", "author": [ { "@type": "Person", "name": "Qianfan Zhang" }, { "@type": "Person", "name": "Tianyu Guo" }, { "@type": "Person", "name": "Xuandi Ren" }, { "@type": "Person", "name": "Jiale Chen" }, { "@type": "Person", "name": "Ming Ding" }, { "@type": "Person", "name": "Ran Xin" }, { "@type": "Person", "name": "Xia Xiao" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Reasoning Enhancement" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Reasoning Enhancement", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Scaling Reasoning Tokens via RL and Parallel Thinking: Evide", "item": "https://sciencetostartup.com/paper/scaling-reasoning-tokens-via-rl-and-parallel-thinking-evidence-from-competitive-programming" } ] } ] }

Competitive landscape

A system that scales reasoning token budgets for competitive programming using RL and parallel thinking to significantly improve performance on hard problems.

Segment

Reasoning Enhancement

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline