LLM Reliability

Proof pending

17papers

6.1viability

-17%30d

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

The reliability of large language models (LLMs) is increasingly scrutinized due to their tendency to generate hallucinated or factually incorrect outputs, particularly in high-stakes applications like healthcare and law. Recent research focuses on enhancing uncertainty estimation and stability analysis to better understand and mitigate these issues. Techniques such as Truth AnChoring and domain-grounded retrieval aim to provide more accurate assessments of LLM outputs, while frameworks like DAVinCI and neuro-symbolic verification offer structured methods for attribution and validation. These advancements are crucial for builders seeking to implement LLMs in environments where accuracy and trustworthiness are paramount, ensuring that the models can be relied upon for critical decision-making processes. As LLMs continue to evolve, addressing their reliability will be essential for their successful integration into various sectors.

Last updated May 28, 2026

Topic-linked question coverage is still building for this proof surface.

Topic trend

Topic-specific paper and score movement from the daily diff ledger.

Papers

1-10 of 17

Research Paper·Apr 1, 2026

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configuratio...

8.0 viabilityHas code

Research Paper·Apr 27, 2026

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system r...

7.0 viability

Research Paper·Jan 27, 2026

Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models

Large Language Models (LLMs) often produce hallucinated or unverifiable content, undermining their reliability in factual domains. This work investigates Reinforcement Learning with Verifiable Rewards...

7.0 viability

Research Paper·Jan 26, 2026

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two ...

7.0 viability

Research Paper·Apr 14, 2026

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent...

7.0 viability

Research Paper·Apr 17, 2026

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittl...

7.0 viability

Research Paper·Mar 18, 2026

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

Large Language Models (LLMs) have achieved unprecedented fluency but remain susceptible to "hallucinations" - the generation of factually incorrect or ungrounded content. This limitation is particular...

7.0 viability

Research Paper·Apr 23, 2026

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation po...

7.0 viabilityHas code

Research Paper·May 27, 2026

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a ...

7.0 viability

Research Paper·May 10, 2026

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Large Language Models are increasingly used as zero-shot classifiers in complex reasoning tasks. However, standard constrained decoding suffers from a phenomenon we define as Renormalization Bias. Whe...

7.0 viability

Page 1 of 2

LLM Reliability

Proof pending

State of the Field

Topic trend

Papers

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Filters

Topic proof surfaces

LLM Reliability

Use this topic page as a durable research-area proof surface