ARXIV:2605.09684 · AGENT MONITORING · SUBMITTED 06 JUN · 20:09 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė · Maria Angelica Martinez · Ollie Matthews · Tyler Tracy · arXiv

A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories. We identify three challenges with current red-teaming.

METHOD

Full abstract

We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current red-teaming. First, mode collapse in attack generation, which we reduce with a novel attack taxonomy for broader coverage. Second, a conceive-execute gap: frontier LLMs can propose strong attack ideas or execute them, but not all at once. We mitigate this by decomposing attack construction into strategy generation, execution, and post-hoc trajectory refinement. Third, manual elicitation is costly to scale, which we address with our semi-automated red-teaming pipeline. Applied to BashArena, an AI control setting for tool-using coding agents, this pipeline produces MonitoringBench, a benchmark of 2,644 attack trajectories for evaluating monitor capabilities and failure modes. Our pipeline produces more diverse and stronger attacks: Opus-4.5 monitor's catch rate falls from 94.9\% on elicited-only Opus attacks to 60.3\% on our best refined attacks, with larger drops for several mid-tier monitors. Attacks optimized against three development monitors generalize to ten held-out monitors, with catch rates generally increasing with monitor capability. Using this benchmark, we provide a snapshot of the current monitor capabilities and find that frontier monitors often detect suspicious actions but fall for persuasion or fail to calibrate suspiciousness scores appropriately, suggesting tractable paths for improvement. MonitoringBench provides both a static benchmark for current tool-use monitors and a reusable methodology for refreshing these evaluations as agents and monitors improve.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. MonitoringBench provides both a static benchmark for current tool-use monitors and a reusable methodology for refreshing these evaluations as agents and monitors improve. Code…

WHY NOW

Agent Monitoring moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

Segment

Agent Monitoring

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c9862d61-72a5-46b8-95ca-5f8ce7c20a36", "arxiv_id": "2605.09684", "canonical_route": "/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "endpoints": { "paper_pack": "/api/v1/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring/paper-pack", "build_passport": "/api/v1/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring", "normalized_query": "2605.09684", "route": "/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "paper_ref": "monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring#webpage", "url": "https://sciencetostartup.com/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "name": "MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring", "description": "A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring#scholarlyArticle", "headline": "MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring", "description": "A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.", "url": "https://sciencetostartup.com/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring", "sameAs": "https://arxiv.org/abs/2605.09684", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.09684" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-10T18:04:31.000Z", "author": [ { "@type": "Person", "name": "Monika Jotautaitė" }, { "@type": "Person", "name": "Maria Angelica Martinez" }, { "@type": "Person", "name": "Ollie Matthews" }, { "@type": "Person", "name": "Tyler Tracy" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agent Monitoring" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agent Monitoring", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MonitoringBench: Semi-Automated Red-Teaming for Agent Monito", "item": "https://sciencetostartup.com/paper/monitoringbench-semi-automated-red-teaming-for-agent-monitoring" } ] } ] }

Competitive landscape

A semi-automated red-teaming pipeline and benchmark for evaluating and improving AI agent monitors by generating diverse and stronger attack trajectories.

Segment

Agent Monitoring

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline