ARXIV:2603.26639 · VISION-LANGUAGE MODELS · SUBMITTED 30 MAR · 21:51 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Make Geometry Matter for Spatial Reasoning

Shihua Zhang · Qiuhong Shen · Shizun Wang · Tianbo Pan · Xinchao Wang · arXiv

A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

Evidence 68 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models…

METHOD

Full abstract

Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models into VLMs. Nevertheless, we observe that naive token fusion followed by standard fine-tuning in this line of work often leaves such geometric cues underutilized for spatial reasoning, as VLMs tend to rely heavily on 2D visual cues. In this paper, we propose GeoSR, a framework designed to make geometry matter by encouraging VLMs to actively reason with geometry tokens. GeoSR introduces two key components: (1) Geometry-Unleashing Masking, which strategically masks portions of 2D vision tokens during training to weaken non-geometric shortcuts and force the model to consult geometry tokens for spatial reasoning; and (2) Geometry-Guided Fusion, a gated routing mechanism that adaptively amplifies geometry token contributions in regions where geometric evidence is critical. Together, these designs unleash the potential of geometry tokens for spatial reasoning tasks. Extensive experiments on both static and dynamic spatial reasoning benchmarks demonstrate that GeoSR consistently outperforms prior methods and establishes new state-of-the-art performance by effectively leveraging geometric information. The project page is available at https://suhzhang.github.io/GeoSR/.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes…

WHY NOW

Vision-Language Models moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

Evidence68 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "c43fa907-e222-4de6-92e9-d33b5d67a98a", "arxiv_id": "2603.26639", "canonical_route": "/paper/make-geometry-matter-for-spatial-reasoning", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "make-geometry-matter-for-spatial-reasoning", "endpoints": { "paper_pack": "/api/v1/paper/make-geometry-matter-for-spatial-reasoning/paper-pack", "build_passport": "/api/v1/paper/make-geometry-matter-for-spatial-reasoning/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Make Geometry Matter for Spatial Reasoning", "normalized_query": "2603.26639", "route": "/paper/make-geometry-matter-for-spatial-reasoning", "paper_ref": "make-geometry-matter-for-spatial-reasoning", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/make-geometry-matter-for-spatial-reasoning#webpage", "url": "https://sciencetostartup.com/paper/make-geometry-matter-for-spatial-reasoning", "name": "Make Geometry Matter for Spatial Reasoning", "description": "A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/make-geometry-matter-for-spatial-reasoning#scholarlyArticle", "headline": "Make Geometry Matter for Spatial Reasoning", "description": "A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.", "url": "https://sciencetostartup.com/paper/make-geometry-matter-for-spatial-reasoning", "sameAs": "https://arxiv.org/abs/2603.26639", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.26639" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-27T17:45:12.000Z", "author": [ { "@type": "Person", "name": "Xinchao Wang", "affiliation": { "@type": "Organization", "name": "National University of Singapore" } }, { "@type": "Person", "name": "Shihua Zhang", "affiliation": { "@type": "Organization", "name": "National University of Singapore" } }, { "@type": "Person", "name": "Qiuhong Shen", "affiliation": { "@type": "Organization", "name": "National University of Singapore" } }, { "@type": "Person", "name": "Shizun Wang", "affiliation": { "@type": "Organization", "name": "National University of Singapore" } }, { "@type": "Person", "name": "Tianbo Pan", "affiliation": { "@type": "Organization", "name": "National University of Singapore" } } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Vision-Language Models" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Vision-Language Models", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Make Geometry Matter for Spatial Reasoning", "item": "https://sciencetostartup.com/paper/make-geometry-matter-for-spatial-reasoning" } ] }, { "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the startup potential of \"Make Geometry Matter for Spatial Reasoning\"?", "acceptedAnswer": { "@type": "Answer", "text": "Enhance spatial reasoning in vision-language models using geometry-driven approaches for improved performance in static and dynamic environments." } }, { "@type": "Question", "name": "What products could be built from this research?", "acceptedAnswer": { "@type": "Answer", "text": "This framework can be productized by integrating with existing vision-language systems to enhance their ability to interpret and execute tasks based on spatial understanding. This can be offered as a feature extension or API to platforms requiring advanced spatial reasoning, such as robotics and autonomous vehicles." } }, { "@type": "Question", "name": "What are the practical use cases?", "acceptedAnswer": { "@type": "Answer", "text": "Develop an AI assistant for robotics that can understand spatial instructions for navigating complex environments using spatial reasoning capabilities enhanced by geometric information." } }, { "@type": "Question", "name": "What industries could this research disrupt?", "acceptedAnswer": { "@type": "Answer", "text": "GeoSR could replace or significantly improve the capabilities of general-purpose VLMs and other spatial reasoning tools by providing a more robust spatial understanding through the integration of geometry cues." } } ] } ] }

Competitive landscape

A framework that forces vision-language models to actively use geometric information for improved spatial reasoning, outperforming existing methods.

Segment

Vision-Language Models

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Make Geometry Matter for Spatial Reasoning

Make Geometry Matter for Spatial Reasoning

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Related Resources

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline