BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence | ScienceToStartup

Page Freshness

Paper proof surface

Canonical route: /paper/bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence

stale

Proof freshness: fresh
Proof status: unverified
Display score: 7/10
Last proof check: 2026-04-06
Score updated: 2026-04-06
Score fresh until: 2026-05-06
References: 0
Source count: 0
Coverage: 0%

This page is showing the last landed evidence receipt and score bundle because the latest proof data is outside the freshness window.

Agent Handoff

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Canonical ID bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence | Route /paper/bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence

REST example

curl https://sciencetostartup.com/api/v1/agent-handoff/paper/bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence

MCP example

{
  "tool": "get_paper",
  "arguments": {
    "arxiv_id": "2604.03216"
  }
}

source_context

{
  "surface": "paper",
  "mode": "paper",
  "query": "BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence",
  "normalized_query": "2604.03216",
  "route": "/paper/bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence",
  "paper_ref": "bas-a-decision-theoretic-approach-to-evaluating-large-language-model-confidence",
  "topic_slug": null,
  "benchmark_ref": null,
  "dataset_ref": null
}

PDF Viewer

100%

Open Full PDF

References (53)

[1]

Entropy Alone is Insufficient for Safe Selective Prediction in LLMs

2026Edward Phillips, F. Gustafsson et al.

[2]

Semantic Self-Distillation for Language Model Uncertainty

2026Edward Phillips, Sean Wu et al.

[3]

Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning

2025Jiayun Wu, Jiashuo Liu et al.

[4]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

2025DeepSeek-AI, A. Liu et al.

[5]

Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior

2025Dang Anh-Hoang, Vu Tran et al.

[6]

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

2025Zhepei Wei, Xiao Yang et al.

[7]

Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs

2025Edward Phillips, Sean Wu et al.

[8]

Why Language Models Hallucinate

2025A. Kalai, Ofir Nachum et al.

[9]

gpt-oss-120b&gpt-oss-20b Model Card

2025OpenAI Sandhini Agarwal, L. Ahmad et al.

[10]

MedGemma Technical Report

2025Andrew Sellergren, Sahar Kazemzadeh et al.

[11]

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

2025Polina Kirichenko, Mark Ibrahim et al.

[12]

Are Reasoning Models More Prone to Hallucination?

2025Zijun Yao, Yantao Liu et al.

[13]

Token-Level Uncertainty Estimation for Large Language Model Reasoning

2025Tunyu Zhang, Haizhou Shi et al.

[14]

Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis

2025Farieda Gaber, Maqsood Shaik et al.

[15]

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

2025Ivo Petrov, Jasper Dekoninck et al.

[16]

Reasoning with Reinforced Functional Token Tuning

2025Kongcheng Zhang, Qi Yao et al.

[17]

Cost-Saving LLM Cascades with Early Abstention

2025Michael J. Zellinger, Rex Liu et al.

[18]

Confidence Improves Self-Consistency in LLMs

2025Amir Taubenfeld, Tom Sheffer et al.

[19]

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

2025DeepSeek-AI, Daya Guo et al.

[20]

Toward expert-level medical question answering with large language models

2025Karan Singhal, Tao Tu et al.

Showing 20 of 53 references

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Use the canonical paper page as a proof artifact

Paper proof surface

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Watch and verify: BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Compute envelope

Evidence ids

Freshness

Hash state

Signature state

Blockers

Research neighborhood

Claim map

Competitive landscape

Subscribe to the weekly brief

References (53)

Related Resources

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Use the canonical paper page as a proof artifact

Paper proof surface

BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Watch and verify: BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Compute envelope

Evidence ids

Freshness

Hash state

Signature state

Blockers

Research neighborhood

Claim map

Competitive landscape

Subscribe to the weekly brief

References (53)

Related Papers

Related Resources