ARXIV:2606.02765 · LLM THEORY · SUBMITTED 03 JUN · 20:33 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

Alexander Guha · arXiv

Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

Ship in 2-4 weeks›Score0.0Evidence unverified

Opportunity summary

Pain Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features…

METHOD

Full abstract

Model dimension ($d_{model}$) is a fundamental hyperparameter in transformer language models, yet its role in setting the geometric limits of feature representation remains under-explored. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features as near-orthogonal directions in latent space - we develop a framework for estimating how many such directions a model can support. We first establish the embedding matrix as a measurable proxy for near-orthogonality constraints across the latent space: the boundary between meaningful token relationships and incidental similarity in the pairwise cosine similarity distribution gives a concrete estimate of the model's accepted deviation $\varepsilon$ from perfect orthogonality. Applying this metric across dozens of open-source models reveals two classes: models with high $\varepsilon$ whose embeddings lack near-orthogonal structure, and models with low $\varepsilon$ that maintain it. We then show that the standard Johnson-Lindenstrauss lemma greatly underestimates the packing efficiency of trained representations, and derive an adjusted capacity formula in which the number of near-orthogonal directions depends on the ratio of vectors to dimensions ($k/d$) rather than the raw count - a single modification that cuts prediction error by two orders of magnitude with no extra parameters. Combining these results, we define representational capacity as an upper bound on the number of distinguishable directions available for features and embeddings in a model's latent space. Capacity is exponentially sensitive to $\varepsilon$, and larger models favor tighter orthogonality constraints over maximizing raw capacity - a pattern compatible with several explanations (a stability-capacity trade-off, a ceiling on usable concepts, or confounds with model scale) that we leave to future work.

RESULT

ScienceToStartup currently rates this 0.0/10 on the public viability pass. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features as near-orthogonal directions in latent space - we develop…

WHY NOW

LLM Theory moved forward this cycle; last verified June 2026. Public score 0.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score0.0

PainDeveloping a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

Segment

LLM Theory

Adoption evidence

Public code linked for build inspection

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "abdecc5a-36ab-4707-a1eb-eab7504f9022", "arxiv_id": "2606.02765", "canonical_route": "/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "endpoints": { "paper_pack": "/api/v1/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models/paper-pack", "build_passport": "/api/v1/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models", "normalized_query": "2606.02765", "route": "/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "paper_ref": "representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models#webpage", "url": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "name": "Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models", "description": "Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models#scholarlyArticle", "headline": "Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models", "description": "Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.", "url": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models", "sameAs": "https://arxiv.org/abs/2606.02765", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2606.02765" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-06-01T18:28:56.000Z", "author": [ { "@type": "Person", "name": "Alexander Guha" } ], "codeRepository": "https://github.com/Alex-Guha/representational-capacity", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Theory" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models#software", "name": "Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models - Source Code", "description": "Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.", "codeRepository": "https://github.com/Alex-Guha/representational-capacity", "url": "https://github.com/Alex-Guha/representational-capacity" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Theory", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Representational Capacity: Geometric Limits on Feature Repre", "item": "https://sciencetostartup.com/paper/representational-capacity-geometric-limits-on-feature-representation-in-transformer-language-models" } ] } ] }

Competitive landscape

Developing a geometric framework to understand the representational capacity limits of transformer language models based on embedding matrix analysis.

Segment

LLM Theory

Adoption evidence

Public code linked for build inspection

Commercial read

0.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline