ARXIV:2605.14754 · LLM EVALUATION · SUBMITTED 15 MAY · 20:13 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition

Gong Zhiren · Tiantong Wu · Jiaming Zhang · Fuyao Zhang · Che Wang · Yurong Hao · +6 at arXiv

A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

Ship in 2-4 weeks›Score4.0Evidence unverified

Opportunity summary

Pain A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse. Existing benchmarks primarily focus on single-turn restricted scenarios, failing to capture the capability boundaries exposed by real-world interactive…

METHOD

Full abstract

Large Language Models (LLMs) are increasingly deployed for knowledge synthesis, yet their capacity for compositional generalization in scientific knowledge remains under-characterized. Existing benchmarks primarily focus on single-turn restricted scenarios, failing to capture the capability boundaries exposed by real-world interactive scientific workflows. To address this, we introduce XDomainBench, a diagnostic benchmark for interactive interdisciplinary scientific reasoning. We formalize the composition order and mixture structure to enable systematic stress-testing from single-discipline to inter-disciplinary, comprising 8,598 interactive sessions across 20 domains and 4 task categories, with 8 realistic trajectory patterns covering difficulty and domain-mixture dynamics, simulating real AI4S scenarios. Large-scale evaluation of LLMs reveals a systematic reasoning collapse as composition order increases, stemming from two root causes: (i) direct difficulty increases induced by domain composition, and (ii) indirect interaction-amplified failures where trajectory patterns trigger error accumulation, reasoning breaks, and domain confusion, ultimately leading to session collapse.

RESULT

ScienceToStartup currently rates this 4.0/10 on the public viability pass. We formalize the composition order and mixture structure to enable systematic stress-testing from single-discipline to inter-disciplinary, comprising 8,598 interactive sessions across 20 domains and…

WHY NOW

LLM Evaluation moved forward this cycle; last verified May 2026. Public score 4.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score4.0

PainA new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "f4984a1b-9f92-438b-8176-41dd65d1ad36", "arxiv_id": "2605.14754", "canonical_route": "/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "endpoints": { "paper_pack": "/api/v1/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition/paper-pack", "build_passport": "/api/v1/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition", "normalized_query": "2605.14754", "route": "/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "paper_ref": "xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition#webpage", "url": "https://sciencetostartup.com/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "name": "XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition", "description": "A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition#scholarlyArticle", "headline": "XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition", "description": "A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.", "url": "https://sciencetostartup.com/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition", "sameAs": "https://arxiv.org/abs/2605.14754", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.14754" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-14T12:19:19.000Z", "author": [ { "@type": "Person", "name": "Gong Zhiren" }, { "@type": "Person", "name": "Tiantong Wu" }, { "@type": "Person", "name": "Jiaming Zhang" }, { "@type": "Person", "name": "Fuyao Zhang" }, { "@type": "Person", "name": "Che Wang" }, { "@type": "Person", "name": "Yurong Hao" }, { "@type": "Person", "name": "Yikun Hou" }, { "@type": "Person", "name": "Foo Ping" }, { "@type": "Person", "name": "Yilei Zhao" }, { "@type": "Person", "name": "Fei Huang" }, { "@type": "Person", "name": "Chau Yuen" }, { "@type": "Person", "name": "Wei Yang Bryan Lim" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 4 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Evaluation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "XDomainBench: Diagnosing Reasoning Collapse in High-Dimensio", "item": "https://sciencetostartup.com/paper/xdomainbench-diagnosing-reasoning-collapse-in-high-dimensional-scientific-knowledge-composition" } ] } ] }

Competitive landscape

A new benchmark for evaluating LLM reasoning in complex, multi-domain scientific workflows to identify systematic collapse.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

4.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition

XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline