ARXIV:2603.26554 · LLM TRAINING · SUBMITTED 30 MAR · 23:58 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Juno Kim · Eshaan Nichani · Denny Wu · Alberto Bietti · Jason D. Lee · arXiv

This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

Evidence 65 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD. We study this question through the linear associative memory problem, a tractable model for factual…

METHOD

Full abstract

Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law frequency distribution. We show that the storage capacity of Muon significantly exceeds that of SGD, and moreover Muon saturates at a larger critical batch size. We further analyze the multi-step dynamics under a thresholded gradient approximation and show that Muon achieves a substantially faster initial recovery rate than SGD, while both methods eventually converge to the information-theoretic limit at comparable speeds. Experiments on synthetic tasks validate the predicted scaling laws. Our analysis provides a quantitative understanding of the signal amplification of Muon and lays the groundwork for establishing scaling laws across more practical language modeling tasks and optimizers.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law…

WHY NOW

LLM Training moved forward this cycle; last verified April 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainThis paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

Evidence65 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "37ffee84-7b89-4670-b87b-b10bc526c013", "arxiv_id": "2603.26554", "canonical_route": "/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "endpoints": { "paper_pack": "/api/v1/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory/paper-pack", "build_passport": "/api/v1/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory", "normalized_query": "2603.26554", "route": "/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "paper_ref": "sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory#webpage", "url": "https://sciencetostartup.com/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "name": "Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory", "description": "This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory#scholarlyArticle", "headline": "Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory", "description": "This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.", "url": "https://sciencetostartup.com/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory", "sameAs": "https://arxiv.org/abs/2603.26554", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26554" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T16:13:18.000Z", "author": [ { "@type": "Person", "name": "Juno Kim" }, { "@type": "Person", "name": "Eshaan Nichani" }, { "@type": "Person", "name": "Denny Wu" }, { "@type": "Person", "name": "Alberto Bietti" }, { "@type": "Person", "name": "Jason D. Lee" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Training" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Training", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Sharp Capacity Scaling of Spectral Optimizers in Learning As", "item": "https://sciencetostartup.com/paper/sharp-capacity-scaling-of-spectral-optimizers-in-learning-associative-memory" } ] } ] }

Competitive landscape

This paper theoretically analyzes spectral optimizers for associative memory recall in language models, showing potential capacity advantages over SGD.

Segment

LLM Training

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline