ARXIV:2602.17881 · AI SYSTEM DIAGNOSTICS · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations

arXiv

Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations. Although effective on average, steering effect sizes vary across samples and are unreliable for many…

METHOD

Full abstract

Steering vectors are a lightweight method for controlling language model behavior by adding a learned bias to the activations at inference time. Although effective on average, steering effect sizes vary across samples and are unreliable for many target behaviors. In my thesis, I investigate why steering reliability differs across behaviors and how it is impacted by steering vector training data. First, I find that higher cosine similarity between training activation differences predicts more reliable steering. Second, I observe that behavior datasets where positive and negative activations are better separated along the steering direction are more reliably steerable. Finally, steering vectors trained on different prompt variations are directionally distinct, yet perform similarly well and exhibit correlated efficacy across datasets. My findings suggest that steering vectors are unreliable when the latent target behavior representation is not effectively approximated by the linear steering direction. Taken together, these insights offer a practical diagnostic for steering unreliability and motivate the development of more robust steering methods that explicitly account for non-linear latent behavior representations.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Taken together, these insights offer a practical diagnostic for steering unreliability and motivate the development of more robust steering methods that explicitly account for…

WHY NOW

AI System Diagnostics moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainDevelop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

Segment

AI System Diagnostics

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "f8010b4d-58b8-441f-8e5a-b7d494f23e57", "arxiv_id": "2602.17881", "canonical_route": "/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "endpoints": { "paper_pack": "/api/v1/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi/paper-pack", "build_passport": "/api/v1/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations", "normalized_query": "2602.17881", "route": "/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "paper_ref": "understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi#webpage", "url": "https://sciencetostartup.com/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "name": "Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations", "description": "Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi#scholarlyArticle", "headline": "Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations", "description": "Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.", "url": "https://sciencetostartup.com/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi", "sameAs": "https://arxiv.org/abs/2602.17881", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.17881" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-19T22:37:05.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI System Diagnostics" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI System Diagnostics", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Understanding Unreliability of Steering Vectors in Language ", "item": "https://sciencetostartup.com/paper/understanding-unreliability-of-steering-vectors-in-language-models-geometric-predictors-and-the-limits-of-linear-approxi" } ] } ] }

Competitive landscape

Develop a tool to diagnose and improve the reliability of steering vectors in language models, addressing non-linear behavior representations.

Segment

AI System Diagnostics

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations

Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline