ARXIV:2605.13737 · OMNIMODAL LLMS · SUBMITTED 14 MAY · 20:10 UTC · FRESHNESS FRESH

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Trung Nguyen Quang · Yiming Gao · Fanyi Pu · Kaichen Zhang · Shuo Sun · Ziwei Liu · arXiv

Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

Evidence 0 refs | 0 sources | 0% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension. Recent omnimodal models are positioned as perception-grounded agents that jointly process video, audio, and text,…

METHOD

Full abstract

When an omnimodal large language model accepts a question whose textual premise contradicts what it actually sees or hears, does the failure lie in perception or in action? Recent omnimodal models are positioned as perception-grounded agents that jointly process video, audio, and text, yet a basic form of grounding remains untested: catching a textual claim that conflicts with the model's own sensory input. We introduce IMAVB, a curated 500-clip benchmark of long-form movies with a 2x2 design crossing target modality (vision, audio) and premise condition (standard, misleading), which lets us measure conflict detection separately from ordinary multimodal comprehension. Across eight open-source omnimodal LLMs and Gemini 3.1 Pro, we document a Representation-Action Gap: hidden states reliably encode premise-perception mismatches even when the same models almost never reject the false claim in their outputs. Behaviorally, models fall into two failure modes: under-rejection, in which they answer misleading questions as if the false premise were true; and over-rejection, in which they reject more often but also reject standard questions, sacrificing ordinary comprehension accuracy. The gap is modality-asymmetric (audio grounding underperforms vision) and prompt-resistant across seven variants. As an initial diagnostic intervention, a probe-guided logit adjustment (PGLA) re-injects the encoded mismatch signal into decoding and consistently improves rejection behavior. Together, these results suggest the bottleneck for omnimodal grounding lies in translation, not perception.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. As an initial diagnostic intervention, a probe-guided logit adjustment (PGLA) re-injects the encoded mismatch signal into decoding and consistently improves rejection behavior. Code availability…

WHY NOW

Omnimodal LLMs moved forward this cycle; last verified May 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainDiagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

Evidence0 refs | 0 sources | 0% coverage

Blockerno shell-level blocker reported

Analysis summary

Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

Segment

Omnimodal LLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "680e9d8e-31e3-47db-a4e1-50dccbbe4c61", "arxiv_id": "2605.13737", "canonical_route": "/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "endpoints": { "paper_pack": "/api/v1/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms/paper-pack", "build_passport": "/api/v1/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs", "normalized_query": "2605.13737", "route": "/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "paper_ref": "senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms#webpage", "url": "https://sciencetostartup.com/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "name": "Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs", "description": "Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms#scholarlyArticle", "headline": "Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs", "description": "Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.", "url": "https://sciencetostartup.com/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms", "sameAs": "https://arxiv.org/abs/2605.13737", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.13737" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-13T16:14:44.000Z", "author": [ { "@type": "Person", "name": "Trung Nguyen Quang" }, { "@type": "Person", "name": "Yiming Gao" }, { "@type": "Person", "name": "Fanyi Pu" }, { "@type": "Person", "name": "Kaichen Zhang" }, { "@type": "Person", "name": "Shuo Sun" }, { "@type": "Person", "name": "Ziwei Liu" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Omnimodal LLMs" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Omnimodal LLMs", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Senses Wide Shut: A Representation-Action Gap in Omnimodal L", "item": "https://sciencetostartup.com/paper/senses-wide-shut-a-representation-action-gap-in-omnimodal-llms" } ] } ] }

Competitive landscape

Diagnosing and improving the grounding capabilities of omnimodal LLMs by identifying and mitigating a representation-action gap in multimodal comprehension.

Segment

Omnimodal LLMs

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline