ARXIV:2605.31308 · AGENT EVALUATION & IMPROVEMENT · SUBMITTED 01 JUN · 20:21 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

Junjie Nian · Kang Chen · Ge Zhang · Yixin Cao · Yugang Jiang · arXiv

TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines. We introduce TraceGraph, a graph-based framework that turns released multi-model agent trajectories into shared decision…

METHOD

Full abstract

Agent benchmarks increasingly record rich interaction trajectories, yet evaluation often reduces each rollout to a pass rate or reward score. We introduce TraceGraph, a graph-based framework that turns released multi-model agent trajectories into shared decision landscapes. For each task, TraceGraph builds a graph over observable action-observation states from pooled rollouts before model identity is introduced. It then overlays outcome-informed productive cores and trap regions, and summarizes each rollout with three events: Access, Trap exposure, and Repair. Across trajectories spanning five benchmark splits, TraceGraph profiles reveal navigation differences hidden by aggregate scores and show that splits differ in whether they reward avoiding traps or recovering from them. The same TraceGraph landscape also motivates a trap-aware recovery pipeline for SWE-bench: aruntime detector fires on states matching historical trap regions, then lightweight continuation policies are evaluated from the same prefix. On fired states, the best pooled single-factor policy raises official resolved rate from 40.4% to 43.5% on the per-provider fired subset and from 41.0% to 44.8% on common-fired instances, with provider-specific active components. Overall, TraceGraph provides a process vocabulary for asking what agent benchmarks test, where models diverge on a shared landscape, and how failure regions can guide downstream improvement.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Across trajectories spanning five benchmark splits, TraceGraph profiles reveal navigation differences hidden by aggregate scores and show that splits differ in whether they reward…

WHY NOW

Agent Evaluation & Improvement moved forward this cycle; last verified June 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainTraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

Segment

Agent Evaluation & Improvement

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d9a919f0-0460-48c2-b73d-380cde861457", "arxiv_id": "2605.31308", "canonical_route": "/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "endpoints": { "paper_pack": "/api/v1/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories/paper-pack", "build_passport": "/api/v1/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories", "normalized_query": "2605.31308", "route": "/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "paper_ref": "tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories#webpage", "url": "https://sciencetostartup.com/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "name": "TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories", "description": "TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories#scholarlyArticle", "headline": "TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories", "description": "TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.", "url": "https://sciencetostartup.com/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories", "sameAs": "https://arxiv.org/abs/2605.31308", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.31308" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-29T13:40:31.000Z", "author": [ { "@type": "Person", "name": "Junjie Nian" }, { "@type": "Person", "name": "Kang Chen" }, { "@type": "Person", "name": "Ge Zhang" }, { "@type": "Person", "name": "Yixin Cao" }, { "@type": "Person", "name": "Yugang Jiang" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Agent Evaluation & Improvement" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Agent Evaluation & Improvement", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "TraceGraph: Shared Decision Landscapes for Diagnosing and Im", "item": "https://sciencetostartup.com/paper/tracegraph-shared-decision-landscapes-for-diagnosing-and-improving-agent-trajectories" } ] } ] }

Competitive landscape

TraceGraph visualizes agent decision landscapes from trajectories, revealing hidden performance differences and enabling targeted improvements for agent recovery pipelines.

Segment

Agent Evaluation & Improvement

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline