ARXIV:2605.11891 · AI SECURITY · SUBMITTED 13 MAY · 20:19 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

Zhaojiacheng Zhou · arXiv

Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

Blocked on Code›Score3.0Evidence unverified

Opportunity summary

Pain Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals. Because a skill exposes both executable behavior and context-setting…

METHOD

Full abstract

Agent skills extend LLM agents with reusable instructions, tool interfaces, and executable code, and users increasingly install third-party skills from marketplaces, repositories, and community channels. Because a skill exposes both executable behavior and context-setting documentation, its deployment risk cannot be measured by single-shot audits or prompt-level red teams alone: a realistic attacker can use audit and runtime feedback to repeatedly rewrite the skill. We frame this risk as \emph{adaptive leakage} -- whether a budgeted attacker can iteratively revise a skill until it passes audit and produces verified runtime harm -- and present \ours{}, a grey-box self-evolving red-team framework for measuring it. Proteus searches a formalized five-axis skill-attack space. Each candidate is evaluated through a unified audit-sandbox-oracle pipeline that returns structured audit findings and runtime evidence to guide cross-round mutation. Beyond initial evasion, Proteus performs path expansion, which finds alternative implementations of successful attacks, and surface expansion, which transfers learned implementation patterns to new attack objectives beyond the original seed catalogue. Across eight phase-1 cells, Proteus reaches 40--90\% Attack Success Rate at $5$ rounds (ASR@5) with positive learning-curve slopes on both evaluated auditors. Phase-2 path/surface expansion produces 438 jointly bypassing and lethal variants, with SkillVetter bypassed at $\geq 93\%$ in every cell and AI-Infra-Guard, the strongest public auditor we evaluate, still admitting up to 41.3\% joint-success. These results show that current skill vetting substantially underestimates residual risk when evaluated against adaptive, feedback-driven attackers.

RESULT

ScienceToStartup currently rates this 3.0/10 on the public viability pass. These results show that current skill vetting substantially underestimates residual risk when evaluated against adaptive, feedback-driven attackers.

WHY NOW

AI Security moved forward this cycle; last verified May 2026. Public score 3.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score3.0

PainProteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

Segment

AI Security

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "98932b7c-130d-481a-9a86-6f6da1eba498", "arxiv_id": "2605.11891", "canonical_route": "/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "endpoints": { "paper_pack": "/api/v1/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems/paper-pack", "build_passport": "/api/v1/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems", "normalized_query": "2605.11891", "route": "/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "paper_ref": "proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems#webpage", "url": "https://sciencetostartup.com/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "name": "Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems", "description": "Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems#scholarlyArticle", "headline": "Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems", "description": "Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.", "url": "https://sciencetostartup.com/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems", "sameAs": "https://arxiv.org/abs/2605.11891", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.11891" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-12T10:05:54.000Z", "author": [ { "@type": "Person", "name": "Zhaojiacheng Zhou" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 3 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "AI Security" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "AI Security", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems", "item": "https://sciencetostartup.com/paper/proteus-a-self-evolving-red-team-for-agent-skill-ecosystems" } ] } ] }

Competitive landscape

Proteus is a self-evolving red team framework for measuring adaptive leakage risk in agent skill ecosystems, highlighting current vetting underestimation but lacking direct productization signals.

Segment

AI Security

Adoption evidence

No public code link in the paper record yet

Commercial read

3.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline