ARXIV:2605.21240 · SELF-EVOLVING AGENTS · SUBMITTED 21 MAY · 20:28 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

Yibo Li · Jiashuo Yang · Zhi Zheng · Zhiyuan Hu · Yuan Sui · Shizun Wang · +2 at arXiv

APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map. But these agents cannot learn on the fly at test time.

METHOD

Full abstract

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents. Code availability is…

WHY NOW

Self-Evolving Agents moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainAPEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

Segment

Self-Evolving Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c42a7e0f-f34a-42f9-9870-0a9e750a6f85", "arxiv_id": "2605.21240", "canonical_route": "/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "endpoints": { "paper_pack": "/api/v1/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents/paper-pack", "build_passport": "/api/v1/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents", "normalized_query": "2605.21240", "route": "/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "paper_ref": "apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents#webpage", "url": "https://sciencetostartup.com/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "name": "APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents", "description": "APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents#scholarlyArticle", "headline": "APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents", "description": "APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.", "url": "https://sciencetostartup.com/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents", "sameAs": "https://arxiv.org/abs/2605.21240", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.21240" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-20T14:29:27.000Z", "author": [ { "@type": "Person", "name": "Yibo Li" }, { "@type": "Person", "name": "Jiashuo Yang" }, { "@type": "Person", "name": "Zhi Zheng" }, { "@type": "Person", "name": "Zhiyuan Hu" }, { "@type": "Person", "name": "Yuan Sui" }, { "@type": "Person", "name": "Shizun Wang" }, { "@type": "Person", "name": "Yufei He" }, { "@type": "Person", "name": "Bryan Hooi" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Self-Evolving Agents" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Self-Evolving Agents", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "APEX: Autonomous Policy Exploration for Self-Evolving LLM Ag", "item": "https://sciencetostartup.com/paper/apex-autonomous-policy-exploration-for-self-evolving-llm-agents" } ] } ] }

Competitive landscape

APEX enhances self-evolving LLM agents by improving exploration through a structured strategy map.

Segment

Self-Evolving Agents

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline