ARXIV:2605.07353 · LLM REASONING RELIABILITY · SUBMITTED 11 MAY · 20:36 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable

Kejia Chen · Jiawen Zhang · Yihong Wu · Kewei Gao · Jian Lou · Zunlei Feng · +2 at arXiv

A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

Ship in 2-4 weeks›Score7.0Evidence partial

Opportunity summary

Pain A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

Evidence 0 refs | 4 sources | 83% coverage

Blocker Evidence partial

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency. Existing alignment strategies address this with external verifiers or massive sampling,…

METHOD

Full abstract

Large reasoning models often reach correct answers through flawed intermediate steps, creating a gap between final accuracy and reasoning reliability. Existing alignment strategies address this with external verifiers or massive sampling, limiting scalability. In this work, we introduce CASPO (Confidence-Aware Step-wise Preference Optimization), a framework that aligns token-level confidence with step-wise logical correctness through iterative Direct Preference Optimization, without training a separate reward model. During inference, we propose Confidence-aware Thought (CaT), which leverages this calibrated confidence to dynamically prune uncertain reasoning branches with negligible O(V) latency. Experiments across ten benchmarks and multiple model families show that CASPO consistently improves reasoning reliability and inference efficiency. CASPO scales to Qwen3-8B-Base and surpasses tree-search baselines on AIME'24 and AIME'25 without using reward-model data. We also release a step-wise dataset with confidence annotations to support fine-grained analysis of reasoning reliability. Code is available at https://github.com/Thecommonirin/CASPO.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Experiments across ten benchmarks and multiple model families show that CASPO consistently improves reasoning reliability and inference efficiency. A public repository is linked, so…

WHY NOW

LLM Reasoning Reliability moved forward this cycle; last verified May 2026. Public score 7.0/10. Implementation evidence is present through a linked repository.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

Evidence0 refs | 4 sources | 83% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: partial proof status

Competitive landscape

A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

Segment

LLM Reasoning Reliability

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "d453a94a-56bc-42d3-b39f-2c3e5562d0ba", "arxiv_id": "2605.07353", "canonical_route": "/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "confidence-aware-alignment-makes-reasoning-llms-more-reliable", "endpoints": { "paper_pack": "/api/v1/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable/paper-pack", "build_passport": "/api/v1/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Confidence-Aware Alignment Makes Reasoning LLMs More Reliable", "normalized_query": "2605.07353", "route": "/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable", "paper_ref": "confidence-aware-alignment-makes-reasoning-llms-more-reliable", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable#webpage", "url": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable", "name": "Confidence-Aware Alignment Makes Reasoning LLMs More Reliable", "description": "A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable#scholarlyArticle", "headline": "Confidence-Aware Alignment Makes Reasoning LLMs More Reliable", "description": "A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.", "url": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable", "sameAs": "https://arxiv.org/abs/2605.07353", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.07353" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-08T07:08:25.000Z", "author": [ { "@type": "Person", "name": "Kejia Chen" }, { "@type": "Person", "name": "Jiawen Zhang" }, { "@type": "Person", "name": "Yihong Wu" }, { "@type": "Person", "name": "Kewei Gao" }, { "@type": "Person", "name": "Jian Lou" }, { "@type": "Person", "name": "Zunlei Feng" }, { "@type": "Person", "name": "Mingli Song" }, { "@type": "Person", "name": "Ruoxi Jia" } ], "codeRepository": "https://github.com/Thecommonirin/CASPO", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Reasoning Reliability" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code, repo url" } ] }, { "@type": "SoftwareSourceCode", "@id": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable#software", "name": "Confidence-Aware Alignment Makes Reasoning LLMs More Reliable - Source Code", "description": "A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.", "codeRepository": "https://github.com/Thecommonirin/CASPO", "url": "https://github.com/Thecommonirin/CASPO" }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Reasoning Reliability", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Confidence-Aware Alignment Makes Reasoning LLMs More Reliabl", "item": "https://sciencetostartup.com/paper/confidence-aware-alignment-makes-reasoning-llms-more-reliable" } ] } ] }

Competitive landscape

A framework that aligns token-level confidence with logical correctness in LLM reasoning, enabling dynamic pruning of uncertain branches for improved reliability and efficiency.

Segment

LLM Reasoning Reliability

Adoption evidence

Public code linked for build inspection

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline