ARXIV:2603.25112 · LLM EVALUATION · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Jon-Paul Cacioli · arXiv

A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know. We introduce an evaluation framework based on Type-2 Signal Detection Theory that…

METHOD

Full abstract

Standard evaluation of LLM confidence relies on calibration metrics (ECE, Brier score) that conflate two distinct capacities: how much a model knows (Type-1 sensitivity) and how well it knows what it knows (Type-2 metacognitive sensitivity). We introduce an evaluation framework based on Type-2 Signal Detection Theory that decomposes these capacities using meta-d' and the metacognitive efficiency ratio M-ratio. Applied to four LLMs (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base, Gemma-2-9B-Instruct) across 224,000 factual QA trials, we find: (1) metacognitive efficiency varies substantially across models even when Type-1 sensitivity is similar -- Mistral achieves the highest d' but the lowest M-ratio; (2) metacognitive efficiency is domain-specific, with different models showing different weakest domains, invisible to aggregate metrics; (3) temperature manipulation shifts Type-2 criterion while meta-d' remains stable for two of four models, dissociating confidence policy from metacognitive capacity; (4) AUROC_2 and M-ratio produce fully inverted model rankings, demonstrating these metrics answer fundamentally different evaluation questions. The meta-d' framework reveals which models "know what they don't know" versus which merely appear well-calibrated due to criterion placement -- a distinction with direct implications for model selection, deployment, and human-AI collaboration. Pre-registered analysis; code and data publicly available.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Applied to four LLMs (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base, Gemma-2-9B-Instruct) across 224,000 factual QA trials, we find: (1) metacognitive efficiency varies substantially across models even when…

WHY NOW

LLM Evaluation moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

Evidence0 refs | 0 sources | 17% coverage

Blockerno shell-level blocker reported

Analysis summary

A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "456910d6-6a2a-42a7-b718-ad894630ece6", "arxiv_id": "2603.25112", "canonical_route": "/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "endpoints": { "paper_pack": "/api/v1/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory/paper-pack", "build_passport": "/api/v1/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory", "normalized_query": "2603.25112", "route": "/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "paper_ref": "do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory#webpage", "url": "https://sciencetostartup.com/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "name": "Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory", "description": "A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory#scholarlyArticle", "headline": "Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory", "description": "A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.", "url": "https://sciencetostartup.com/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory", "sameAs": "https://arxiv.org/abs/2603.25112", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.25112" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-26T07:38:28.000Z", "author": [ { "@type": "Person", "name": "Jon-Paul Cacioli" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "LLM Evaluation" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "LLM Evaluation", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Do LLMs Know What They Know? Measuring Metacognitive Efficie", "item": "https://sciencetostartup.com/paper/do-llms-know-what-they-know-measuring-metacognitive-efficiency-with-signal-detection-theory" } ] } ] }

Competitive landscape

A new evaluation framework using signal detection theory to measure LLM metacognitive efficiency, revealing which models truly know what they don't know.

Segment

LLM Evaluation

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline