ARXIV:2603.15253 · IMAGE CAPTIONING · SUBMITTED 02 APR · 02:30 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning

arXiv

HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

Blocked on Code›Score7.0Evidence unverified

Opportunity summary

Pain HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

Evidence 0 refs | 0 sources | 17% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models. Beyond evaluation, effective hallucination detection is also essential for curating high-quality image-caption pairs used to train…

METHOD

Full abstract

Hallucination detection in captions (HalDec) assesses a vision-language model's ability to correctly align image content with text by identifying errors in captions that misrepresent the image. Beyond evaluation, effective hallucination detection is also essential for curating high-quality image-caption pairs used to train VLMs. However, the generalizability of VLMs as hallucination detectors across different captioning models and hallucination types remains unclear due to the lack of a comprehensive benchmark. In this work, we introduce HalDec-Bench, a benchmark designed to evaluate hallucination detectors in a principled and interpretable manner. HalDec-Bench contains captions generated by diverse VLMs together with human annotations indicating the presence of hallucinations, detailed hallucination-type categories, and segment-level labels. The benchmark provides tasks with a wide range of difficulty levels and reveals performance differences across models that are not visible in existing multimodal reasoning or alignment benchmarks. Our analysis further uncovers two key findings. First, detectors tend to recognize sentences appearing at the beginning of a response as correct, regardless of their actual correctness. Second, our experiments suggest that dataset noise can be substantially reduced by using strong VLMs as filters while employing recent VLMs as caption generators. Our project page is available at https://dahlian00.github.io/HalDec-Bench-Page/.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our project page is available at https://dahlian00.github.io/HalDec-Bench-Page/.

WHY NOW

Image Captioning moved forward this cycle; last verified April 2026. Public score 7.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainHalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

Evidence0 refs | 0 sources | 17% coverage

Blockermissing authors

Analysis summary

HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

Segment

Image Captioning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "416a92b2-e8e4-492f-a0ca-2924ce34903f", "arxiv_id": "2603.15253", "canonical_route": "/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "endpoints": { "paper_pack": "/api/v1/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning/paper-pack", "build_passport": "/api/v1/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning", "normalized_query": "2603.15253", "route": "/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "paper_ref": "haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning#webpage", "url": "https://sciencetostartup.com/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "name": "HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning", "description": "HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning#scholarlyArticle", "headline": "HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning", "description": "HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.", "url": "https://sciencetostartup.com/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning", "sameAs": "https://arxiv.org/abs/2603.15253", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.15253" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-16T13:21:55.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Image Captioning" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Image Captioning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "HalDec-Bench: Benchmarking Hallucination Detector in Image C", "item": "https://sciencetostartup.com/paper/haldec-bench-benchmarking-hallucination-detector-in-image-captioning" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "Now is the time because VLMs are rapidly being integrated into commercial products, but their hallucination issues are becoming a bottleneck for reliability. With increasing regulatory scrutiny on AI accuracy (e.g., in advertising or healthcare) and a competitive market where trust differentiates AI providers, a tool that benchmarks and improves caption fidelity addresses an urgent need before widespread adoption leads to costly errors." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "An e-commerce platform uses VLMs to auto-generate product descriptions from images; a hallucination detection tool based on HalDec-Bench scans these captions for errors (e.g., mislabeled colors or features), flags inaccuracies for human review, and retrains models with cleaner data, reducing returns and customer complaints." } } ] } ] }

Competitive landscape

HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.

Segment

Image Captioning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning

HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline