ARXIV:2601.22574 · VIDEO AI ENHANCEMENT · SUBMITTED 19 MAR · 18:48 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

arXiv

Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

Blocked on Code›Score6.0Evidence unverified

Opportunity summary

Pain Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

Evidence 0 refs | 0 sources | 33% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability. However, existing decoding methods for mitigating video hallucinations, while considering the spatiotemporal characteristics of videos, mostly rely on heuristic designs.

METHOD

Full abstract

Although Video Large Language Models perform remarkably well across tasks such as video understanding, question answering, and reasoning, they still suffer from the problem of hallucination, which refers to generating outputs that are inconsistent with explicit video content or factual evidence. However, existing decoding methods for mitigating video hallucinations, while considering the spatiotemporal characteristics of videos, mostly rely on heuristic designs. As a result, they fail to precisely capture the root causes of hallucinations and their fine-grained temporal and semantic correlations, leading to limited robustness and generalization in complex scenarios. To more effectively mitigate video hallucinations, we propose a novel decoding strategy termed Spatiotemporal-Semantic Contrastive Decoding. This strategy constructs negative features by deliberately disrupting the spatiotemporal consistency and semantic associations of video features, and suppresses video hallucinations through contrastive decoding against the original video features during inference. Extensive experiments demonstrate that our method not only effectively mitigates the occurrence of hallucinations, but also preserves the general video understanding and reasoning capabilities of the model.

RESULT

ScienceToStartup currently rates this 6.0/10 on the public viability pass. As a result, they fail to precisely capture the root causes of hallucinations and their fine-grained temporal and semantic correlations, leading to limited robustness…

WHY NOW

Video AI Enhancement moved forward this cycle; last verified April 2026. Public score 6.0/10.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score6.0

PainSpatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

Evidence0 refs | 0 sources | 33% coverage

Blockermissing authors

Analysis summary

Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

VerifiedSource: PDF linkedPartialPaperPack: 3 of 4 citation fields filledMissingMissing fields: authorsPartialProof: unverified proof status

Competitive landscape

Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

Segment

Video AI Enhancement

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "044aca5f-b578-4e5f-a176-4269d943117a", "arxiv_id": "2601.22574", "canonical_route": "/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "endpoints": { "paper_pack": "/api/v1/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding/paper-pack", "build_passport": "/api/v1/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding", "normalized_query": "2601.22574", "route": "/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "paper_ref": "mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding#webpage", "url": "https://sciencetostartup.com/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "name": "Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding", "description": "Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding#scholarlyArticle", "headline": "Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding", "description": "Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.", "url": "https://sciencetostartup.com/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding", "sameAs": "https://arxiv.org/abs/2601.22574", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2601.22574" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-01-30T05:16:12.000Z", "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 6 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Video AI Enhancement" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Video AI Enhancement", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Mitigating Hallucinations in Video Large Language Models via", "item": "https://sciencetostartup.com/paper/mitigating-hallucinations-in-video-large-language-models-via-spatiotemporal-semantic-contrastive-decoding" } ] } ] }

Competitive landscape

Spatiotemporal-Semantic Contrastive Decoding reduces hallucinations in video models, enhancing video comprehension and reliability.

Segment

Video AI Enhancement

Adoption evidence

No public code link in the paper record yet

Commercial read

6.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline