ARXIV:2602.05066 · AI SECURITY · SUBMITTED 19 MAR · 18:48 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

arXiv

Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought (CoT) and tool-use actions to ensure alignment with user intent.

METHOD

Full abstract

As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring protocols that jointly evaluate an agent's Chain-of-Thought (CoT) and tool-use actions to ensure alignment with user intent. We demonstrate that these monitoring-based defenses can be bypassed via a novel Agent-as-a-Proxy attack, where prompt injection attacks treat the agent as a delivery mechanism, bypassing both agent and monitor simultaneously. While prior work on scalable oversight has focused on whether small monitors can supervise large agents, we show that even frontier-scale monitors are vulnerable. Large-scale monitoring models like Qwen2.5-72B can be bypassed by agents with similar capabilities, such as GPT-4o mini and Llama-3.1-70B. On the AgentDojo benchmark, we achieve a high attack success rate against AlignmentCheck and Extract-and-Evaluate monitors under diverse monitoring LLMs. Our findings suggest current monitoring-based agentic defenses are fundamentally fragile regardless of model scale.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. We demonstrate that these monitoring-based defenses can be bypassed via a novel Agent-as-a-Proxy attack, where prompt injection attacks treat the agent as a delivery…

WHY NOW

AI Security moved forward this cycle; last verified April 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainNovel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

References(14)

Reference metadata pending (14760eb9716c11e86175d42625f4669f989332aa)

Reference metadata pending (be9f43dcb027e96af326146077c52e2d361aa605)

Reference metadata pending (07734909a75edc13eb838d9f83b52dcc1ea06e64)

Reference metadata pending (5c33a1dade777d08f3d4ba8a761a4902dafd211a)

Reference metadata pending (0ed8da38a5fb60cd5fc70d325a63e6120111756c)

Reference metadata pending (0a6a350653369dc92fde4cf9992951534ed1f169)

Reference metadata pending (c8eee9766f0968e8f1b1be0731bc70b85be0ac97)

Reference metadata pending (14e8cf5a5e6a7b35e618b08f5cf06f572b3a54e0)

Reference metadata pending (f3f23f7f9f5369aade19f20bc5d028cce7b9c9aa)

Reference metadata pending (47030369e97cc44d4b2e3cf1be85da0fd134904a)

Reference metadata pending (9716a2876d08fce9d8e5c5ba4d7b1a9af44806d6)

Reference metadata pending (61d6196f8a2aa25ee60b3415d6b3233bbbfa66f8)

Reference metadata pending (62163b97bbb37f5c35571054b9105a6243d0f056)

Reference metadata pending (8dec6cecbeddcec8ec1352c662cad256ffa6849e)

{ "contract_version": "paper-r2", "paper_id": "18d441f5-e4ff-4542-84db-60a8090c14df", "arxiv_id": "2602.05066", "canonical_route": "/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "endpoints": { "paper_pack": "/api/v1/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks/paper-pack", "build_passport": "/api/v1/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks", "normalized_query": "2602.05066", "route": "/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "paper_ref": "bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks#webpage", "url": "https://sciencetostartup.com/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "name": "Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks", "description": "Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks#scholarlyArticle", "headline": "Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks", "description": "Novel cybersecurity method to bypass AI monitoring protocols via agent-proxy attacks.", "url": "https://sciencetostartup.com/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks", "sameAs": "https://arxiv.org/abs/2602.05066", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2602.05066" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-02-04T21:38:38.000Z", "citation": [ { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "14760eb9716c11e86175d42625f4669f989332aa" }, "url": "https://www.semanticscholar.org/paper/14760eb9716c11e86175d42625f4669f989332aa" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "be9f43dcb027e96af326146077c52e2d361aa605" }, "url": "https://www.semanticscholar.org/paper/be9f43dcb027e96af326146077c52e2d361aa605" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "07734909a75edc13eb838d9f83b52dcc1ea06e64" }, "url": "https://www.semanticscholar.org/paper/07734909a75edc13eb838d9f83b52dcc1ea06e64" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "5c33a1dade777d08f3d4ba8a761a4902dafd211a" }, "url": "https://www.semanticscholar.org/paper/5c33a1dade777d08f3d4ba8a761a4902dafd211a" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "0ed8da38a5fb60cd5fc70d325a63e6120111756c" }, "url": "https://www.semanticscholar.org/paper/0ed8da38a5fb60cd5fc70d325a63e6120111756c" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "0a6a350653369dc92fde4cf9992951534ed1f169" }, "url": "https://www.semanticscholar.org/paper/0a6a350653369dc92fde4cf9992951534ed1f169" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "c8eee9766f0968e8f1b1be0731bc70b85be0ac97" }, "url": "https://www.semanticscholar.org/paper/c8eee9766f0968e8f1b1be0731bc70b85be0ac97" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "14e8cf5a5e6a7b35e618b08f5cf06f572b3a54e0" }, "url": "https://www.semanticscholar.org/paper/14e8cf5a5e6a7b35e618b08f5cf06f572b3a54e0" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "f3f23f7f9f5369aade19f20bc5d028cce7b9c9aa" }, "url": "https://www.semanticscholar.org/paper/f3f23f7f9f5369aade19f20bc5d028cce7b9c9aa" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "47030369e97cc44d4b2e3cf1be85da0fd134904a" }, "url": "https://www.semanticscholar.org/paper/47030369e97cc44d4b2e3cf1be85da0fd134904a" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "9716a2876d08fce9d8e5c5ba4d7b1a9af44806d6" }, "url": "https://www.semanticscholar.org/paper/9716a2876d08fce9d8e5c5ba4d7b1a9af44806d6" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "61d6196f8a2aa25ee60b3415d6b3233bbbfa66f8" }, "url": "https://www.semanticscholar.org/paper/61d6196f8a2aa25ee60b3415d6b3233bbbfa66f8" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "62163b97bbb37f5c35571054b9105a6243d0f056" }, "url": "https://www.semanticscholar.org/paper/62163b97bbb37f5c35571054b9105a6243d0f056" }, { "@type": "ScholarlyArticle", "identifier": { "@type": "PropertyValue", "propertyID": "SemanticScholar", "value": "8dec6cecbeddcec8ec1352c662cad256ffa6849e" }, "url": "https://www.semanticscholar.org/paper/8dec6cecbeddcec8ec1352c662cad256ffa6849e" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Security" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks", "item": "https://sciencetostartup.com/paper/bypassing-ai-control-protocols-via-agent-as-a-proxy-attacks" } ] } ] }