ARXIV:2604.21016 · LLM TRAINING OPTIMIZATION · SUBMITTED 24 APR · 20:33 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

SGD at the Edge of Stability: The Stochastic Sharpness Gap

Fangshuo Liao · Afroditi Kolomvaki · Anastasios Kyrillidis · arXiv

Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

Ship in 2-4 weeks›Score3.0Evidence unverified

Opportunity summary

Pain Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics. \citet{damian2023selfstab} showed that this behavior is explained by a self-stabilization mechanism driven by third-order structure…

METHOD

Full abstract

When training neural networks with full-batch gradient descent (GD) and step size $η$, the largest eigenvalue of the Hessian -- the sharpness $S(\boldsymbolθ)$ -- rises to $2/η$ and hovers there, a phenomenon termed the Edge of Stability (EoS). \citet{damian2023selfstab} showed that this behavior is explained by a self-stabilization mechanism driven by third-order structure of the loss, and that GD implicitly follows projected gradient descent (PGD) on the constraint $ S(\boldsymbolθ)\leq 2/η$. For mini-batch stochastic gradient descent (SGD), the sharpness stabilizes below $2/η$, with the gap widening as the batch size decreases; yet no theoretical explanation exists for this suppression. We introduce stochastic self-stabilization, extending the self-stabilization framework to SGD. Our key insight is that gradient noise injects variance into the oscillatory dynamics along the top Hessian eigenvector, strengthening the cubic sharpness-reducing force and shifting the equilibrium below $2/η$. Following the approach of \citet{damian2023selfstab}, we define stochastic predicted dynamics relative to a moving projected gradient descent trajectory and prove a stochastic coupling theorem that bounds the deviation of SGD from these predictions. We derive a closed-form equilibrium sharpness gap: $ΔS = ηβσ_{\boldsymbol{u}}^{2}/(4α)$, where $α$ is the progressive sharpening rate, $β$ is the self-stabilization strength, and $σ_{ \boldsymbol{u}}^{2}$ is the gradient noise variance projected onto the top eigenvector. This formula predicts that smaller batch sizes yield flatter solutions and recovers GD when the batch equals the full dataset.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. This formula predicts that smaller batch sizes yield flatter solutions and recovers GD when the batch equals the full dataset. Code availability is flagged…

WHY NOW

LLM Training Optimization moved forward this cycle; last verified April 2026. Public score 3.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainTheoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

Segment

LLM Training Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "6feb859d-2f2e-4015-bd9a-6b1f3a815b00", "arxiv_id": "2604.21016", "canonical_route": "/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "endpoints": { "paper_pack": "/api/v1/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap/paper-pack", "build_passport": "/api/v1/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "SGD at the Edge of Stability: The Stochastic Sharpness Gap", "normalized_query": "2604.21016", "route": "/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "paper_ref": "sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap#webpage", "url": "https://sciencetostartup.com/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "name": "SGD at the Edge of Stability: The Stochastic Sharpness Gap", "description": "Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap#scholarlyArticle", "headline": "SGD at the Edge of Stability: The Stochastic Sharpness Gap", "description": "Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.", "url": "https://sciencetostartup.com/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap", "sameAs": "https://arxiv.org/abs/2604.21016", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.21016" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-22T19:02:52.000Z", "author": [ { "@type": "Person", "name": "Fangshuo Liao" }, { "@type": "Person", "name": "Afroditi Kolomvaki" }, { "@type": "Person", "name": "Anastasios Kyrillidis" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training Optimization" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training Optimization", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "SGD at the Edge of Stability: The Stochastic Sharpness Gap", "item": "https://sciencetostartup.com/paper/sgd-at-the-edge-of-stability-the-stochastic-sharpness-gap" } ] } ] }

Competitive landscape

Theoretical framework explaining and predicting the sharpness gap in SGD for neural network training, offering insights into optimization dynamics.

Segment

LLM Training Optimization

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

SGD at the Edge of Stability: The Stochastic Sharpness Gap

SGD at the Edge of Stability: The Stochastic Sharpness Gap

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline