ARXIV:2605.30557 · VLM SPATIAL REASONING · SUBMITTED 01 JUN · 20:25 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Yue Zhang · Zun Wang · Han Lin · Yonatan Bitton · Idan Szpektor · Mohit Bansal · arXiv

This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

Evidence 0 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models. However, visual observations are inherently…

METHOD

Full abstract

Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects invisible, and perspective can make geometric properties misleading. Despite this, existing spatial reasoning benchmarks typically assume that observations are sufficient and reliable, focusing on whether models produce correct answers rather than whether they recognize when a question cannot be answered and what additional observations would be needed. In this work, we challenge this assumption by constructing a controlled evaluation framework, SpatialUncertain, and introducing two types of observation challenges: (1) occlusion, which hides target information, and (2) perspective ambiguity, which produces misleading visual cues. For each configuration, we design spatial questions that are answerable under clean observations but require abstention under the introduced challenges. We further evaluate whether models can identify which additional viewpoints would resolve perspective ambiguity. Our results across a diverse set of frontier open- and closed-source VLMs reveal two consistent failure modes. First, models are prone to overconfident answering, attempting to solve spatial reasoning tasks even when visual evidence is incomplete or misleading, with average accuracy around 30\% under occlusion and below 10\% under perspective ambiguity. Second, even when additional views are available, some models perform near random chance in identifying which would provide reliable evidence. Together, our findings call for moving beyond answer correctness toward evaluating whether models know when to abstain and how to seek reliable evidence.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Our results across a diverse set of frontier open- and closed-source VLMs reveal two consistent failure modes. Code availability is flagged in the production…

WHY NOW

VLM Spatial Reasoning moved forward this cycle; last verified June 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainThis work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

Evidence0 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

Segment

VLM Spatial Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "2c4d2448-9bca-4aea-9f06-4dcac303a400", "arxiv_id": "2605.30557", "canonical_route": "/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "endpoints": { "paper_pack": "/api/v1/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why/paper-pack", "build_passport": "/api/v1/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?", "normalized_query": "2605.30557", "route": "/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "paper_ref": "seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why#webpage", "url": "https://sciencetostartup.com/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "name": "Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?", "description": "This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why#scholarlyArticle", "headline": "Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?", "description": "This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.", "url": "https://sciencetostartup.com/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why", "sameAs": "https://arxiv.org/abs/2605.30557", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2605.30557" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-05-28T20:44:47.000Z", "author": [ { "@type": "Person", "name": "Yue Zhang" }, { "@type": "Person", "name": "Zun Wang" }, { "@type": "Person", "name": "Han Lin" }, { "@type": "Person", "name": "Yonatan Bitton" }, { "@type": "Person", "name": "Idan Szpektor" }, { "@type": "Person", "name": "Mohit Bansal" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "VLM Spatial Reasoning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "VLM Spatial Reasoning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatia", "item": "https://sciencetostartup.com/paper/seeing-isn-t-knowing-do-vlms-know-when-not-to-answer-spatial-questions-and-why" } ] } ] }

Competitive landscape

This work introduces a framework to evaluate if Vision Language Models know when not to answer spatial questions due to visual limitations, highlighting a critical gap in current models.

Segment

VLM Spatial Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline