ARXIV:2604.10990 · LLM REASONING · SUBMITTED 14 APR · 16:52 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

When Verification Fails: How Compositionally Infeasible Claims Escape Rejection

Muxin Liu · Delip Rao · Grace Kim · Chris Callison-Burch · arXiv

New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning. This process involves evaluating each asserted constraint against validated evidence.

METHOD

Full abstract

Scientific claim verification, the task of determining whether claims are entailed by scientific evidence, is fundamental to establishing discoveries in evidence while preventing misinformation. This process involves evaluating each asserted constraint against validated evidence. Under the Closed-World Assumption (CWA), a claim is accepted if and only if all asserted constraints are positively supported. We show that existing verification benchmarks cannot distinguish models enforcing this standard from models applying a simpler shortcut called salient-constraint checking, which applies CWA's rejection criterion only to the most salient constraint and accepts when that constraint is supported. Because existing benchmarks construct infeasible claims by perturbing a single salient element they are insufficient at distinguishing between rigorous claim verification and simple salient-constraint reliance. To separate the two, we construct compositionally infeasible claims where the salient constraint is supported but a non-salient constraint is contradicted. Across model families and modalities, models that otherwise saturate existing benchmarks consistently over-accept these claims, confirming the prevalence of such shortcut reasoning. Via model context interventions, we show that different models and prompting strategies occupy distinct positions on a shared ROC curve, indicating that the gap between model families reflects differences in verification threshold rather than underlying reasoning ability, and that the compositional inference bottleneck is a structural property of current verification behavior that strategy guidance alone cannot overcome.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. We show that existing verification benchmarks cannot distinguish models enforcing this standard from models applying a simpler shortcut called salient-constraint checking, which applies CWA's…

WHY NOW

LLM Reasoning moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainNew benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

Segment

LLM Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "7adefaf3-4cbd-4ab0-a1ab-4156839c49b8", "arxiv_id": "2604.10990", "canonical_route": "/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "endpoints": { "paper_pack": "/api/v1/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection/paper-pack", "build_passport": "/api/v1/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "When Verification Fails: How Compositionally Infeasible Claims Escape Rejection", "normalized_query": "2604.10990", "route": "/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "paper_ref": "when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection#webpage", "url": "https://sciencetostartup.com/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "name": "When Verification Fails: How Compositionally Infeasible Claims Escape Rejection", "description": "New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection#scholarlyArticle", "headline": "When Verification Fails: How Compositionally Infeasible Claims Escape Rejection", "description": "New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.", "url": "https://sciencetostartup.com/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection", "sameAs": "https://arxiv.org/abs/2604.10990", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.10990" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-13T04:48:20.000Z", "author": [ { "@type": "Person", "name": "Muxin Liu" }, { "@type": "Person", "name": "Delip Rao" }, { "@type": "Person", "name": "Grace Kim" }, { "@type": "Person", "name": "Chris Callison-Burch" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Reasoning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Reasoning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "When Verification Fails: How Compositionally Infeasible Clai", "item": "https://sciencetostartup.com/paper/when-verification-fails-how-compositionally-infeasible-claims-escape-rejection" } ] } ] }

Competitive landscape

New benchmarks reveal LLMs struggle with compositional claim verification, relying on salient shortcuts rather than robust reasoning.

Segment

LLM Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

When Verification Fails: How Compositionally Infeasible Claims Escape Rejection

When Verification Fails: How Compositionally Infeasible Claims Escape Rejection

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline