ARXIV:2604.02685 · LLM INTERPRETABILITY · SUBMITTED 06 APR · 20:16 UTC · FRESHNESS UNKNOWN

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Finding Belief Geometries with Sparse Autoencoders

Matthew Levinson · arXiv

A pipeline to discover and validate belief-like geometric structures within large language model representations.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain A pipeline to discover and validate belief-like geometric structures within large language model representations.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A pipeline to discover and validate belief-like geometric structures within large language model representations. Prior work has shown that transformers trained on sequences generated by hidden Markov models encode probabilistic belief states as simplex-shaped…

METHOD

Full abstract

Understanding the geometric structure of internal representations is a central goal of mechanistic interpretability. Prior work has shown that transformers trained on sequences generated by hidden Markov models encode probabilistic belief states as simplex-shaped geometries in their residual stream, with vertices corresponding to latent generative states. Whether large language models trained on naturalistic text develop analogous geometric representations remains an open question. We introduce a pipeline for discovering candidate simplex-structured subspaces in transformer representations, combining sparse autoencoders (SAEs), $k$-subspace clustering of SAE features, and simplex fitting using AANet. We validate the pipeline on a transformer trained on a multipartite hidden Markov model with known belief-state geometry. Applied to Gemma-2-9B, we identify 13 priority clusters exhibiting candidate simplex geometry ($K \geq 3$). A key challenge is distinguishing genuine belief-state encoding from tiling artifacts: latents can span a simplex-shaped subspace without the mixture coordinates carrying predictive signal beyond any individual feature. We therefore adopt barycentric prediction as our primary discriminating test. Among the 13 priority clusters, 3 exhibit a highly significant advantage on near-vertex samples (Wilcoxon $p < 10^{-14}$) and 4 on simplex-interior samples. Together 5 distinct real clusters pass at least one split, while no null cluster passes either. One cluster, 768_596, additionally achieves the highest causal steering score in the dataset. This is the only case where passive prediction and active intervention converge. We present these findings as preliminary evidence that genuine belief-like geometry exists in Gemma-2-9B's representation space, and identify the structured evaluation that would be required to confirm this interpretation.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. One cluster, 768_596, additionally achieves the highest causal steering score in the dataset. Code availability is flagged in the production record; the public repository…

WHY NOW

LLM Interpretability moved forward this cycle; last verified April 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA pipeline to discover and validate belief-like geometric structures within large language model representations.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A pipeline to discover and validate belief-like geometric structures within large language model representations.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A pipeline to discover and validate belief-like geometric structures within large language model representations.

Segment

LLM Interpretability

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "8524d8c6-1e41-434a-8b02-2971027d5afc", "arxiv_id": "2604.02685", "canonical_route": "/paper/finding-belief-geometries-with-sparse-autoencoders", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "finding-belief-geometries-with-sparse-autoencoders", "endpoints": { "paper_pack": "/api/v1/paper/finding-belief-geometries-with-sparse-autoencoders/paper-pack", "build_passport": "/api/v1/paper/finding-belief-geometries-with-sparse-autoencoders/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Finding Belief Geometries with Sparse Autoencoders", "normalized_query": "2604.02685", "route": "/paper/finding-belief-geometries-with-sparse-autoencoders", "paper_ref": "finding-belief-geometries-with-sparse-autoencoders", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/finding-belief-geometries-with-sparse-autoencoders#webpage", "url": "https://sciencetostartup.com/paper/finding-belief-geometries-with-sparse-autoencoders", "name": "Finding Belief Geometries with Sparse Autoencoders", "description": "A pipeline to discover and validate belief-like geometric structures within large language model representations.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/finding-belief-geometries-with-sparse-autoencoders#scholarlyArticle", "headline": "Finding Belief Geometries with Sparse Autoencoders", "description": "A pipeline to discover and validate belief-like geometric structures within large language model representations.", "url": "https://sciencetostartup.com/paper/finding-belief-geometries-with-sparse-autoencoders", "sameAs": "https://arxiv.org/abs/2604.02685", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.02685" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-03T03:29:48.000Z", "author": [ { "@type": "Person", "name": "Matthew Levinson" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Interpretability" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Interpretability", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Finding Belief Geometries with Sparse Autoencoders", "item": "https://sciencetostartup.com/paper/finding-belief-geometries-with-sparse-autoencoders" } ] } ] }

Competitive landscape

A pipeline to discover and validate belief-like geometric structures within large language model representations.

Segment

LLM Interpretability

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Finding Belief Geometries with Sparse Autoencoders

Finding Belief Geometries with Sparse Autoencoders

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline