Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation | ScienceToStartup

Page Freshness

Paper proof surface

Canonical route: /paper/lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation

stale

Proof freshness: stale
Proof status: unverified
Display score: 7/10
Last proof check: 2026-04-15
Score updated: 2026-04-15
Score fresh until: 2026-05-15
References: 0
Source count: 4
Coverage: 67%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Canonical ID lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation | Route /paper/lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2604.13010"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation",
  "normalized_query": "2604.13010",
  "route": "/paper/lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation",
  "paper_ref": "lightning-opd-efficient-post-training-for-large-reasoning-models-with-offline-on-policy-distillation",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Use the canonical paper page as a proof artifact

Paper proof surface

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Ready for execution: Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Compute envelope

Evidence ids

Freshness

Hash state

Signature state

Blockers

Research neighborhood

Claim map

Source proof

Competitive landscape

Subscribe to the weekly brief

References

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Use the canonical paper page as a proof artifact

Paper proof surface

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Ready for execution: Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Compute envelope

Evidence ids

Freshness

Hash state

Signature state

Blockers

Research neighborhood

Claim map

Source proof

Competitive landscape

Subscribe to the weekly brief

References

Related Papers