ARXIV:2604.21131 · AI AGENT SECURITY · SUBMITTED 24 APR · 20:29 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz · arXiv

Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

Evidence 0 refs | 4 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm. We make three contributions to cross-session threat detection.

METHOD

Full abstract

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts). (2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets. (3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at $K=50$ is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into $\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}$, benchmarking rankers on a single Pareto of recall versus serving stability.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into…

WHY NOW

AI Agent Security moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainDetect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

Evidence0 refs | 4 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

Segment

AI Agent Security

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d00a6ebd-5b11-49d2-b6f4-a4ef8d74b445", "arxiv_id": "2604.21131", "canonical_route": "/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "endpoints": { "paper_pack": "/api/v1/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms/paper-pack", "build_passport": "/api/v1/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms", "normalized_query": "2604.21131", "route": "/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "paper_ref": "cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms#webpage", "url": "https://sciencetostartup.com/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "name": "Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms", "description": "Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms#scholarlyArticle", "headline": "Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms", "description": "Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.", "url": "https://sciencetostartup.com/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms", "sameAs": "https://arxiv.org/abs/2604.21131", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.21131" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-22T22:40:31.000Z", "author": [ { "@type": "Person", "name": "Ari Azarafrooz" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Agent Security" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Agent Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Cross-Session Threats in AI Agents: Benchmark, Evaluation, a", "item": "https://sciencetostartup.com/paper/cross-session-threats-in-ai-agents-benchmark-evaluation-and-algorithms" } ] } ] }

Competitive landscape

Detect cross-session threats in AI agents with a novel dataset, measurement framework, and bounded-memory reader algorithm.

Segment

AI Agent Security

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline