On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning | Signal Canvas | ScienceToStartup

Page Freshness

Signal Canvas proof surface

Canonical route: /signal-canvas/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-04-03
Score updated: 2026-04-03
Score fresh until: 2026-05-03
References: 0
Source count: 0
Coverage: 33%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Canonical ID on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning | Route /signal-canvas/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/signal-canvas/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning

MCP example

{
  "tool": "search_signal_canvas",
  "arguments": {
    "mode": "paper",
    "paper_ref": "on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning",
    "query_text": "Summarize On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning"
  }
}

source_context

{
  "surface": "signal_canvas",
  "mode": "paper",
  "query": "On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning",
  "normalized_query": "2604.01702",
  "route": "/signal-canvas/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning",
  "paper_ref": "on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Evidence Receipt

Route status: building

Claims: 8

References: Pending verification

Proof: Verification pending

Freshness state: computing

Source paper: On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

PDF: https://arxiv.org/pdf/2604.01702v1

Source count: Pending verification

Coverage: 33%

Last proof check: 2026-04-03T20:50:40.820Z

Signal Canvas receipt window

Watch and verify: On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

/buildability/on-the-role-of-reasoning-patterns-in-the-generalization-discrepancy-of-long-chain-of-thought-supervised-fine-tuning

Watchwatch

Subject: On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Verdict

Watch

Preparing verified analysis

GitHub Code Pulse

No public code linked for this paper yet.

Claim map

Strong 8Mixed 0Weak 0

Evidencepartial
Despite their comparable performance, we uncover a striking paradox: lower training loss does not translate to better generalization.
Implicationpartial
Directly stated as a discovered paradox with supporting evidence
Verificationpartial
partial
Evidencepartial
SFT on \texttt{DeepSeek-R1-0528} data achieves remarkably lower training loss, yet exhibits significantly worse generalization performance on reasoning benchmarks compared to those trained on \texttt{gpt-oss-120b}.
Implicationpartial
Explicitly stated in the abstract with clear comparative results
Verificationpartial
partial
Evidencepartial
Our analysis reveals a difference in reasoning patterns. \texttt{gpt-oss-120b} exhibits highly convergent and deductive trajectories, whereas \texttt{DeepSeek-R1-0528} favors a divergent and branch-heavy exploration pattern.
Implicationpartial
Directly stated in the abstract as the key finding from multi-faceted analysis
Verificationpartial
partial
Evidencepartial
Consequently, models trained with \texttt{DeepSeek-R1} data inherit inefficient exploration behaviors, often getting trapped in redundant exploratory branches that hinder them from reaching correct solutions.
Implicationpartial
Directly stated in the abstract as a consequence of the reasoning pattern difference
Verificationpartial
partial
Evidencepartial
Building upon this insight, we propose a simple yet effective remedy of filtering out frequently branching trajectories to improve the generalization of SFT.
Implicationpartial
Explicitly stated as a proposed remedy with specific performance improvements
Verificationpartial
partial
Evidencepartial
Experiments show that training on selected \texttt{DeepSeek-R1-0528} subsets surprisingly improves reasoning performance by up to 5.1% on AIME25, 5.5% on BeyondAIME, and on average 3.6% on five benchmarks.
Implicationpartial
Explicitly stated with specific numeric results in the abstract
Verificationpartial
partial
Evidencepartial
However, how CoT trajectories from different sources influence the generalization performance of models remains an open question.
Implicationpartial
Directly stated in the abstract as the research motivation
Verificationpartial
partial
Evidencepartial
with their problem sets controlled to be identical
Implicationpartial
Explicitly stated in the abstract as part of the experimental design
Verificationpartial
partial

Author intelligence and commercialization panels stay hidden until the proof receipt is verified, cites at least 3 references, includes at least 2 sources, and clears 50% coverage. The paper narrative and citation surfaces remain public while verification is pending.

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Use Signal Canvas as the narrative proof surface

Use this Signal Canvas via API or MCP

Signal Canvas proof surface