ARXIV:2604.06846 · MEDICAL DIALOGUE BENCHMARKING · SUBMITTED 10 APR · 00:14 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Xiaotian Luo · Xun Jiang · Jiangcheng Wu · arXiv

MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

Evidence 5 refs | 3 sources | 67% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models. We introduce MedDialBench, a benchmark enabling controlled, dose-response characterization of how individual patient behavior…

METHOD

Full abstract

Interactive medical dialogue benchmarks have shown that LLM diagnostic accuracy degrades significantly when interacting with non-cooperative patients, yet existing approaches either apply adversarial behaviors without graded severity or case-specific grounding, or reduce patient non-cooperation to a single ungraded axis, and none analyze cross-dimension interactions. We introduce MedDialBench, a benchmark enabling controlled, dose-response characterization of how individual patient behavior dimensions affect LLM diagnostic robustness. It decomposes patient behavior into five dimensions -- Logic Consistency, Health Cognition, Expression Style, Disclosure, and Attitude -- each with graded severity levels and case-specific behavioral scripts. This controlled factorial design enables graded sensitivity analysis, dose-response profiling, and cross-dimension interaction detection. Evaluating five frontier LLMs across 7,225 dialogues (85 cases x 17 configurations x 5 models), we find a fundamental asymmetry: information pollution (fabricating symptoms) produces 1.7-3.4x larger accuracy drops than information deficit (withholding information), and fabricating is the only configuration achieving statistical significance across all five models (McNemar p < 0.05). Among six dimension combinations, fabricating is the sole driver of super-additive interaction: all three fabricating-involving pairs produce O/E ratios of 0.70-0.81 (35-44% of eligible cases fail under the combination despite succeeding under each dimension alone), while all non-fabricating pairs show purely additive effects (O/E ~ 1.0). Inquiry strategy moderates deficit but not pollution: exhaustive questioning recovers withheld information, but cannot compensate for fabricated inputs. Models exhibit distinct vulnerability profiles, with worst-case drops ranging from 38.8 to 54.1 percentage points.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. This controlled factorial design enables graded sensitivity analysis, dose-response profiling, and cross-dimension interaction detection. Code availability is flagged in the production record; the public…

WHY NOW

Medical Dialogue Benchmarking moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainMedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

Evidence5 refs | 3 sources | 67% coverage

Blockerno shell-level blocker reported

Analysis summary

MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

Segment

Medical Dialogue Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2b8b8031-542c-4020-8144-ea270d60f9a6", "arxiv_id": "2604.06846", "canonical_route": "/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "endpoints": { "paper_pack": "/api/v1/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors/paper-pack", "build_passport": "/api/v1/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors", "normalized_query": "2604.06846", "route": "/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "paper_ref": "meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors#webpage", "url": "https://sciencetostartup.com/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "name": "MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors", "description": "MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors#scholarlyArticle", "headline": "MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors", "description": "MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.", "url": "https://sciencetostartup.com/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors", "sameAs": "https://arxiv.org/abs/2604.06846", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2604.06846" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-04-08T09:09:08.000Z", "author": [ { "@type": "Person", "name": "Xiaotian Luo" }, { "@type": "Person", "name": "Xun Jiang" }, { "@type": "Person", "name": "Jiangcheng Wu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Medical Dialogue Benchmarking" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Medical Dialogue Benchmarking", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "MedDialBench: Benchmarking LLM Diagnostic Robustness under P", "item": "https://sciencetostartup.com/paper/meddialbench-benchmarking-llm-diagnostic-robustness-under-parametric-adversarial-patient-behaviors" } ] } ] }

Competitive landscape

MedDialBench is a novel benchmark for evaluating LLM diagnostic robustness against adversarial patient behaviors, revealing critical vulnerabilities in current models.

Segment

Medical Dialogue Benchmarking

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline