ARXIV:2604.18658 · AI AGENT SAFETY · SUBMITTED 22 APR · 21:32 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang · Yiqing Jiang · arXiv

Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors. Real-world incidents illustrate the gap: Slack AI credential exfiltration (Aug…

METHOD

Full abstract

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI credential exfiltration (Aug 2024), Microsoft 365 Copilot calendar-injection leaks (Jan 2024), and a Meta agent unauthorized forum post exposing operational data (Mar 2026). We propose Owner-Harm, a formal threat model with eight categories of agent behavior damaging the deployer. We quantify the defense gap on two benchmarks: a compositional safety system achieves 100% TPR / 0% FPR on AgentHarm (generic criminal harm) yet only 14.8% (4/27; 95% CI: 5.9%-32.5%) on AgentDojo injection tasks (prompt-injection-mediated owner harm). A controlled generic-LLM baseline shows the gap is not inherent to owner-harm (62.7% vs. 59.3%, delta 3.4 pp) but arises from environment-bound symbolic rules that fail to generalize across tool vocabularies. On a post-hoc 300-scenario owner-harm benchmark, the gate alone achieves 75.3% TPR / 3.3% FPR; adding a deterministic post-audit verifier raises overall TPR to 85.3% (+10.0 pp) and Hijacking detection from 43.3% to 93.3%, demonstrating strong layer complementarity. We introduce the Symbolic-Semantic Defense Generalization (SSDG) framework relating information coverage to detection rate. Two SSDG experiments partially validate it: context deprivation amplifies the detection gap 3.4x (R = 3.60 vs. R = 1.06); context injection reveals structured goal-action alignment, not text concatenation, is required for effective owner-harm detection.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We quantify the defense gap on two benchmarks: a compositional safety system achieves 100% TPR / 0% FPR on AgentHarm (generic criminal harm) yet…

WHY NOW

AI Agent Safety moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainIntroducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

Segment

AI Agent Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "bcb61371-5b6d-4386-b562-9c8c099fc4b7", "arxiv_id": "2604.18658", "canonical_route": "/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "owner-harm-a-missing-threat-model-for-ai-agent-safety", "endpoints": { "paper_pack": "/api/v1/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety/paper-pack", "build_passport": "/api/v1/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Owner-Harm: A Missing Threat Model for AI Agent Safety", "normalized_query": "2604.18658", "route": "/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety", "paper_ref": "owner-harm-a-missing-threat-model-for-ai-agent-safety", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety#webpage", "url": "https://sciencetostartup.com/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety", "name": "Owner-Harm: A Missing Threat Model for AI Agent Safety", "description": "Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety#scholarlyArticle", "headline": "Owner-Harm: A Missing Threat Model for AI Agent Safety", "description": "Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.", "url": "https://sciencetostartup.com/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety", "sameAs": "https://arxiv.org/abs/2604.18658", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.18658" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-20T10:11:26.000Z", "author": [ { "@type": "Person", "name": "Dongcheng Zhang" }, { "@type": "Person", "name": "Yiqing Jiang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Agent Safety" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Agent Safety", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Owner-Harm: A Missing Threat Model for AI Agent Safety", "item": "https://sciencetostartup.com/paper/owner-harm-a-missing-threat-model-for-ai-agent-safety" } ] } ] }

Competitive landscape

Introducing Owner-Harm, a new threat model and benchmark for AI agent safety, with a proposed defense system that significantly improves detection of deployer-harming behaviors.

Segment

AI Agent Safety

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Owner-Harm: A Missing Threat Model for AI Agent Safety

Owner-Harm: A Missing Threat Model for AI Agent Safety

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline