ARXIV:2603.27967 · EMBODIED AI / SPATIAL REASONING · SUBMITTED 31 MAR · 20:21 UTC · FRESHNESS STALE

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Learning Multi-View Spatial Reasoning from Cross-View Relations

Suchae Jeong · Jaehwi Song · Haeone Lee · Hanna Kim · Jian Kim · Dongjun Lee · +6 at arXiv

A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

Ship in 2-4 weeks›Score7.0Evidence unverified

Opportunity summary

Pain A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

Evidence 111 refs | 3 sources | 50% coverage

Blocker Evidence unverified

Open Build Read PDF Signal Canvas Track

PROBLEM

A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics. In this work, we introduce Cross-View Relations (XVR), a large-scale dataset designed to teach…

METHOD

Full abstract

Vision-language models (VLMs) have achieved impressive results on single-view vision tasks, but lack the multi-view spatial reasoning capabilities essential for embodied AI systems to understand 3D environments and manipulate objects across different viewpoints. In this work, we introduce Cross-View Relations (XVR), a large-scale dataset designed to teach VLMs spatial reasoning across multiple views. XVR comprises 100K vision-question-answer samples derived from 18K diverse 3D scenes and 70K robotic manipulation trajectories, spanning three fundamental spatial reasoning tasks: Correspondence (matching objects across views), Verification (validating spatial relationships), and Localization (identifying object positions). VLMs fine-tuned on XVR achieve substantial improvements on established multi-view and robotic spatial reasoning benchmarks (MindCube and RoboSpatial). When integrated as backbones in Vision-Language-Action models, XVR-trained representations improve success rates on RoboCasa. Our results demonstrate that explicit training on cross-view spatial relations significantly enhances multi-view reasoning and transfers effectively to real-world robotic manipulation.

RESULT

ScienceToStartup currently rates this 7.0/10 on the public viability pass. Vision-language models (VLMs) have achieved impressive results on single-view vision tasks, but lack the multi-view spatial reasoning capabilities essential for embodied AI systems to…

WHY NOW

Embodied AI / Spatial Reasoning moved forward this cycle; last verified April 2026. Public score 7.0/10. Production flags indicate code availability.

Continue into Read for claims, analysis, references, and neighboring papers.

Opportunity summary

Score7.0

PainA new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

Evidence111 refs | 3 sources | 50% coverage

Blockerno shell-level blocker reported

Analysis summary

A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

VerifiedSource: PDF linkedVerifiedPaperPack: citation fields availablePartialProof: unverified proof status

Competitive landscape

A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

Segment

Embodied AI / Spatial Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

{ "contract_version": "paper-r2", "paper_id": "cd44fd6a-51aa-4e81-9d2c-1002f4afb40f", "arxiv_id": "2603.27967", "canonical_route": "/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations", "active_tab": "synced from current hash by the drawer client", "selected_artifact": "learning-multi-view-spatial-reasoning-from-cross-view-relations", "endpoints": { "paper_pack": "/api/v1/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations/paper-pack", "build_passport": "/api/v1/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations/build-passport", "mcp_resource": "sciencetostartup://surfaces/paper-workspace" } }

{ "surface": "paper", "mode": "paper", "query": "Learning Multi-View Spatial Reasoning from Cross-View Relations", "normalized_query": "2603.27967", "route": "/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations", "paper_ref": "learning-multi-view-spatial-reasoning-from-cross-view-relations", "topic_slug": null, "benchmark_ref": null, "dataset_ref": null }

{ "@context": "https://schema.org", "@graph": [ { "@type": "WebPage", "@id": "https://sciencetostartup.com/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations#webpage", "url": "https://sciencetostartup.com/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations", "name": "Learning Multi-View Spatial Reasoning from Cross-View Relations", "description": "A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.", "isPartOf": { "@id": "https://sciencetostartup.com/#website" } }, { "@type": "ScholarlyArticle", "@id": "https://sciencetostartup.com/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations#scholarlyArticle", "headline": "Learning Multi-View Spatial Reasoning from Cross-View Relations", "description": "A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.", "url": "https://sciencetostartup.com/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations", "sameAs": "https://arxiv.org/abs/2603.27967", "identifier": { "@type": "PropertyValue", "propertyID": "arXiv", "value": "2603.27967" }, "isAccessibleForFree": true, "isPartOf": { "@id": "https://sciencetostartup.com/#website" }, "datePublished": "2026-03-30T02:42:25.000Z", "author": [ { "@type": "Person", "name": "Suchae Jeong" }, { "@type": "Person", "name": "Jaehwi Song" }, { "@type": "Person", "name": "Haeone Lee" }, { "@type": "Person", "name": "Hanna Kim" }, { "@type": "Person", "name": "Jian Kim" }, { "@type": "Person", "name": "Dongjun Lee" }, { "@type": "Person", "name": "Dong Kyu Shin" }, { "@type": "Person", "name": "Changyeon Kim" }, { "@type": "Person", "name": "Dongyoon Hahm" }, { "@type": "Person", "name": "Woogyeol Jin" }, { "@type": "Person", "name": "Juheon Choi" }, { "@type": "Person", "name": "Kimin Lee" } ], "additionalProperty": [ { "@type": "PropertyValue", "propertyID": "viabilityScore", "value": 7 }, { "@type": "PropertyValue", "propertyID": "researchDomain", "value": "Embodied AI / Spatial Reasoning" }, { "@type": "PropertyValue", "propertyID": "commercialReadiness", "value": "code" } ] }, { "@type": "BreadcrumbList", "itemListElement": [ { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://sciencetostartup.com" }, { "@type": "ListItem", "position": 2, "name": "Embodied AI / Spatial Reasoning", "item": "https://sciencetostartup.com/topics" }, { "@type": "ListItem", "position": 3, "name": "Learning Multi-View Spatial Reasoning from Cross-View Relati", "item": "https://sciencetostartup.com/paper/learning-multi-view-spatial-reasoning-from-cross-view-relations" } ] } ] }

Competitive landscape

A new dataset and fine-tuning approach for vision-language models to enable robust multi-view spatial reasoning for embodied AI and robotics.

Segment

Embodied AI / Spatial Reasoning

Adoption evidence

No public code link in the paper record yet

Commercial read

7.0/10 public viability

Direct

not classified

Adjacent

not classified

Substitute

not classified

Unknown

not classified

Learning Multi-View Spatial Reasoning from Cross-View Relations

Learning Multi-View Spatial Reasoning from Cross-View Relations

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline

Claim map

Constellation map

Competitive landscape

Buzz

PDF

REFERENCES

Related Papers

Subscribe to the weekly brief

Build artifacts

Brief

Experiment plan

Validation checklist

Scientific founder

Translational engineer

Domain operator

GTM lead

Regulatory/clinical advisor

Timeline