ARXIV:2604.18179 · LLM SECURITY · SUBMITTED 21 APR · 04:18 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Ziyang Liu · arXiv

A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

Blocked on Code›Score4.0Evidence unverified

Opportunity summary

Pain A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can…

METHOD

Full abstract

Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can route the verifier's probe to the advertised model while serving ordinary users from a substitute. We propose a commit-open protocol that closes this gap. Before any opening request, the provider commits via a Merkle tree to a per-position sparse-autoencoder (SAE) feature-trace sketch of its served output at a published probe layer. A verifier opens random positions, scores them against a public named-circuit probe library calibrated with cross-backend noise, and decides with a fixed-threshold joint-consistency z-score rule. We instantiate the protocol on three backbones -- Qwen3-1.7B, Gemma-2-2B, and a 4.5x scale-up to Gemma-2-9B with a 131k-feature SAE. Of 17 attackers spanning same-family lifts, cross-family substitutes, and rank-<=128 adaptive LoRA, all are rejected at a shared, scale-stable threshold; the same attackers all evade a matched SVIP-style parallel-serve baseline. A white-box end-to-end attack that backpropagates through the frozen SAE encoder does not close the margin, and a feature-forgery attacker that never runs M_hon is bounded in closed form by an intrinsic-dimension argument. Commitment adds <=2.1% to forward-only wall-clock at batch 32.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. Commitment adds <=2.1% to forward-only wall-clock at batch 32.

WHY NOW

LLM Security moved forward this cycle; last verified April 2026. Public score 4.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

Segment

LLM Security

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "a715f488-297d-469a-897a-e171db676e44", "arxiv_id": "2604.18179", "canonical_route": "/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "endpoints": { "paper_pack": "/api/v1/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms/paper-pack", "build_passport": "/api/v1/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs", "normalized_query": "2604.18179", "route": "/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "paper_ref": "committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms#webpage", "url": "https://sciencetostartup.com/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "name": "Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs", "description": "A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms#scholarlyArticle", "headline": "Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs", "description": "A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.", "url": "https://sciencetostartup.com/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms", "sameAs": "https://arxiv.org/abs/2604.18179", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.18179" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T12:34:56.000Z", "author": [ { "@type": "Person", "name": "Ziyang Liu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Security" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Committed SAE-Feature Traces for Audited-Session Substitutio", "item": "https://sciencetostartup.com/paper/committed-sae-feature-traces-for-audited-session-substitution-detection-in-hosted-llms" } ] } ] }

Competitive landscape

A protocol for detecting substitute LLMs in hosted services by committing to and auditing sparse autoencoder features of served outputs.

Segment

LLM Security

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline