ARXIV:2604.01702 · LLM REASONING · SUBMITTED 03 APR · 20:50 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Zhaoyi Li · Xiangyu Xi · Zhengyu Chen · Wei Wang · Gangwei Jiang · Ranran Shen · +3 at arXiv

This research identifies a critical flaw in how large language models learn reasoning from diverse data sources and proposes a filtering method to significantly improve generalization performance on complex reasoning…

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain This research identifies a critical flaw in how large language models learn reasoning from diverse data sources and proposes a filtering method to significantly improve generalization performance on complex reasoning tasks.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

METHOD

Full abstract

Supervised Fine-Tuning (SFT) on long Chain-of-Thought (CoT) trajectories has become a pivotal phase in building large reasoning models. However, how CoT trajectories from different sources influence the generalization performance of models remains an open question. In this paper, we conduct a comparative study using two sources of verified CoT trajectories generated by two competing models, \texttt{DeepSeek-R1-0528} and \texttt{gpt-oss-120b}, with their problem sets controlled to be identical. Despite their comparable performance, we uncover a striking paradox: lower training loss does not translate to better generalization. SFT on \texttt{DeepSeek-R1-0528} data achieves remarkably lower training loss, yet exhibits significantly worse generalization performance on reasoning benchmarks compared to those trained on \texttt{gpt-oss-120b}. To understand this paradox, we perform a multi-faceted analysis probing token-level SFT loss and step-level reasoning behaviors. Our analysis reveals a difference in reasoning patterns. \texttt{gpt-oss-120b} exhibits highly convergent and deductive trajectories, whereas \texttt{DeepSeek-R1-0528} favors a divergent and branch-heavy exploration pattern. Consequently, models trained with \texttt{DeepSeek-R1} data inherit inefficient exploration behaviors, often getting trapped in redundant exploratory branches that hinder them from reaching correct solutions. Building upon this insight, we propose a simple yet effective remedy of filtering out frequently branching trajectories to improve the generalization of SFT. Experiments show that training on selected \texttt{DeepSeek-R1-0528} subsets surprisingly improves reasoning performance by up to 5.1% on AIME25, 5.5% on BeyondAIME, and on average 3.6% on five benchmarks.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. SFT on \texttt{DeepSeek-R1-0528} data achieves remarkably lower training loss, yet exhibits significantly worse generalization performance on reasoning benchmarks compared to those trained on \texttt{gpt-oss-120b}.…

WHY NOW

LLM Reasoning moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis research identifies a critical flaw in how large language models learn reasoning from diverse data sources and proposes a filtering method to significantly improve generalization performance on complex reasoning tasks.

Evidence0 refs | 0 sources | 33% coverage

Blockerno shell-level blocker reported

Analysis summary

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Zhaoyi Li · Xiangyu Xi · Zhengyu Chen · Wei Wang · Gangwei Jiang · Ranran Shen · +3 at arXiv

Competitive landscape

Segment

LLM Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c4353bf7-9216-4d69-87b9-6c684fda4aaa", "arxiv_id": "2604.01702", "canonical_route": "/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "endpoints": { "paper_pack": "/api/v1/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning/paper-pack", "build_passport": "/api/v1/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning", "normalized_query": "2604.01702", "route": "/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "paper_ref": "on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning#webpage", "url": "https://sciencetostartup.com/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "name": "On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning", "description": "This research identifies a critical flaw in how large language models learn reasoning from diverse data sources and proposes a filtering method to significantly improve generalization performance on complex reasoning tasks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning#scholarlyArticle", "headline": "On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning", "description": "This research identifies a critical flaw in how large language models learn reasoning from diverse data sources and proposes a filtering method to significantly improve generalization performance on complex reasoning tasks.", "url": "https://sciencetostartup.com/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning", "sameAs": "https://arxiv.org/abs/2604.01702", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.01702" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-02T07:00:54.000Z", "author": [ { "@type": "Person", "name": "Zhaoyi Li" }, { "@type": "Person", "name": "Xiangyu Xi" }, { "@type": "Person", "name": "Zhengyu Chen" }, { "@type": "Person", "name": "Wei Wang" }, { "@type": "Person", "name": "Gangwei Jiang" }, { "@type": "Person", "name": "Ranran Shen" }, { "@type": "Person", "name": "Linqi Song" }, { "@type": "Person", "name": "Ying Wei" }, { "@type": "Person", "name": "Defu Lian" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Reasoning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Reasoning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "On the Role of Reasoning Patterns in the Generalization Disc", "item": "https://sciencetostartup.com/paper/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning" } ] } ] }

Competitive landscape

Segment

LLM Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline