ARXIV:2605.14621 · VISION-LANGUAGE MODELS · SUBMITTED 15 MAY · 20:13 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Tian Qin · Junzhe Chen · Yuqing Shi · Tianshu Zhang · Qiang Ju · Lijie Wen · arXiv

SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

Blocked on Code›Score5.0Evidence unverified

Opportunity summary

Pain SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools. Existing contrastive decoding methods mitigate this problem by comparing predictions from the original image with those…

METHOD

Full abstract

Large vision-language models (LVLMs) often hallucinate when language priors dominate weak or ambiguous visual evidence. Existing contrastive decoding methods mitigate this problem by comparing predictions from the original image with those from externally perturbed visual inputs, but such references can introduce off-manifold artifacts and require costly extra forward passes. We propose SIRA, a training-free internal contrastive decoding framework that constructs a counterfactual reference inside the same LVLM by exploiting the staged information flow of multimodal transformers. Instead of removing visual information from the input, SIRA first lets image and text tokens interact through a shared prefix, forming an aligned multimodal state that preserves prompt interpretation, decoding history, positional structure, and early visual grounding. It then forks a counterfactual branch in later transformer layers, where attention to image-token positions is masked. This branch retains the shared multimodal context but lacks continued access to fine-grained visual evidence, yielding a language-prior-dominated internal reference for token-level contrast. During decoding, SIRA suppresses tokens that remain strong without late visual access and favors predictions whose advantage depends on the full visual pathway. Experiments on POPE, CHAIR, and AMBER with Qwen2.5-VL and LLaVA-v1.5 show that SIRA consistently reduces hallucinations while preserving descriptive coverage and incurring lower overhead than two-pass contrastive decoding. SIRA requires no training, external verifier, or perturbed input, and applies to open-weight LVLMs with white-box inference access.

RESULT

ScienceToStartup currently rates this 5.0/10 on the public viability pass. Experiments on POPE, CHAIR, and AMBER with Qwen2.5-VL and LLaVA-v1.5 show that SIRA consistently reduces hallucinations while preserving descriptive coverage and incurring lower overhead…

WHY NOW

Vision-Language Models moved forward this cycle; last verified May 2026. Public score 5.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score5.0

PainSIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "3400056d-a247-4279-9187-aeb916e262bd", "arxiv_id": "2605.14621", "canonical_route": "/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "endpoints": { "paper_pack": "/api/v1/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution/paper-pack", "build_passport": "/api/v1/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution", "normalized_query": "2605.14621", "route": "/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "paper_ref": "do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution#webpage", "url": "https://sciencetostartup.com/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "name": "Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution", "description": "SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution#scholarlyArticle", "headline": "Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution", "description": "SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.", "url": "https://sciencetostartup.com/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution", "sameAs": "https://arxiv.org/abs/2605.14621", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.14621" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-14T09:37:55.000Z", "author": [ { "@type": "Person", "name": "Tian Qin" }, { "@type": "Person", "name": "Junzhe Chen" }, { "@type": "Person", "name": "Yuqing Shi" }, { "@type": "Person", "name": "Tianshu Zhang" }, { "@type": "Person", "name": "Qiang Ju" }, { "@type": "Person", "name": "Lijie Wen" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 5 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Do We Really Need External Tools to Mitigate Hallucinations?", "item": "https://sciencetostartup.com/paper/do-we-really-need-external-tools-to-mitigate-hallucinations-sira-shared-prefix-internal-reconstruction-of-attribution" } ] } ] }

Competitive landscape

SIRA is a training-free method to reduce hallucinations in vision-language models by reconstructing internal references without external tools.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

5.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline